Meaningful characterisation of perturbative theoretical uncertainties

We consider the problem of assigning a meaningful degree of belief to uncertainty estimates of perturbative series. We analyse the assumptions which are implicit in the conventional estimates made using renormalisation scale variations. We then formulate a Bayesian model that, given equivalent initial hypotheses, allows one to characterise a perturbative theoretical uncertainty in a rigorous way in terms of a credibility interval for the remainder of the series. We compare its outcome to the conventional uncertainty estimates in the simple case of the calculation of QCD corrections to the e+e- ->hadrons process. We find comparable results, but with important conceptual differences. This work represents a first step in the direction of a more comprehensive and rigorous handling of theoretical uncertainties in perturbative calculations used in high energy phenomenology.


Introduction
The Large Hadron Collider (LHC) has finally been fired up and, no black hole having swallowed the Earth, the race to collect data and analyse them has now started in earnest. While the short term goal is to rediscover the Standard Model, the long term one will JHEP09(2011)039 of course be to find signals of "new physics", be it the Higgs boson, supersymmetry, or something else, more exotic and possibly unexpected. While it is everybody's hope that discoveries will announce themselves in the form of unambiguous signals, it is of course conceivable, and probably also unavoidable initially, that they may rather present themselves cloaked under some subtle data/theory discrepancy. If this is the case, a full control of the uncertainty of the theoretical predictions becomes naturally of paramount importance: when comparing an experimental measurement to a theoretical calculation, we must be able to say if they agree or not, and with what degree of confidence we are making such statement. This is impossible to achieve unless both the experiment and the theory are provided with a meaningful (and commonly accepted) degree of uncertainty.
While most of what we discuss below can apply to any kind of theoretical prediction in perturbation theory, we will specialize it to the context of Quantum Chromodynamics (QCD): many LHC processes and backgrounds pertain to the QCD realm and, due to the relatively large size of the QCD coupling α s and therefore the slower perturbative convergence, the issue of theoretical accuracy is more pressing. Theoretical predictions in QCD contain multiple ingredients, inputs that must be ultimately extracted from experimental data, like Parton Distribution Functions (PDFs) for hadronic collisions and the value of α s . In the past several years a lot of progress has been made on α s and the PDFs. The uncertainty with which we know the coupling is now quite small (see for instance [1]). Moreover, several groups [2][3][4][5][6] have extracted PDF sets with associated uncertainties of experimental origin, and provided frameworks to properly propagate them to the observable one is calculating. Huge progress has also been made in performing higher order perturbative calculations for a large number of phenomenologically interesting observables [7], thereby potentially improving the accuracy with which they are known.
One area where progress has instead arguably not been made is in an understanding of the meaning of the residual theoretical uncertainty, given by unknown higher orders in perturbation theory. This uncertainty is usually estimated by varying unphysical momentum scales (we will denote them collectively by µ) contained in the perturbative result, like the renormalisation and the factorisation scales, around a central value µ 0 , usually taken to coincide with a physical momentum scale Q of the process. The method, the range in which to vary the scales (typically [µ 0 /2, 2µ 0 ]), and their central value µ 0 = Q are highly conventional, but nevertheless quite commonly accepted. They allow the community to efficiently exchange a conventional uncertainty which can be easily compared between different calculations.
Among the shortcomings one may find in this procedure, the most glaring one is probably that it does not allow one to estimate the degree of belief (DoB) of the resulting uncertainty band. By this we mean that it is not possible to associate a value, 68.3%, 95.5% or 99.7% for example, to our belief that the uncertainty band contains the true sum 1 of the JHEP09(2011)039 series. This lack of a proper characterisation of the perturbative theoretical uncertainty also means that procedures to combine it with other sources of uncertainties (e.g. the value of the coupling and the PDFs) are at best ambiguous or controversial, as exemplified by a recent discussion [8] about the proper way to estimate the total uncertainty of the prediction for the Higgs production cross section at hadron colliders. All this makes it potentially impossible to fully and rigorously assess our degree of belief that an experimental result may agree (or not) with theory, making betting (or, in order to cover both sides, offering odds) on new physics having been discovered or not an altogether unscientific -and potentially risky -proposition.
The purpose of this paper is precisely to try to make such a bet potentially safe, consistently with the coherent bet idea of de Finetti [9]. To achieve this we construct a model that leads to a well defined credibility measure for a perturbative theoretical uncertainty, so that the degree of belief of a given interval can be explicitly calculated. In section 2 we first review the commonly used theoretical uncertainty estimation via unphysical scales variations, and subsequently proceed to define the Bayesian model from which we then extract our credibility distributions. Section 3 compares the results of our credibility-based model with those of the conventional method, allowing one to assign a degree of belief to the uncertainty bands given by the latter. Some results for the e + e − → hadrons process are given in section 5, to better illustrate the model with a realistic example. Section 6 discusses some of the hypotheses that were made in building the model, and section 7 extends it to the case of partial knowledge of higher order coefficients.
Before closing this introduction we wish to stress the following point: we are not trying to improve our knowledge of a perturbative prediction by adding physical information or even just speculations about its form, or by (improbably) seeking physical content inside the mathematical formalism: the only information that enters the result is what has been explicitly calculated, i.e. the known coefficients of a perturbative series. To this information we add hypotheses meant to formalize assumptions that are often implicitly made when estimating theoretical uncertainty using scale variations, and we use the framework of Bayesian probability (see section 2.2) for computing from them and from the available information the degree of belief of given uncertainty intervals. The hypotheses need not even be strictly true (or people may disagree about them), but once they are made the path to the calculation of the degree of belief values is a rigorous one.

Theoretical uncertainty estimates
For definiteness, consider the perturbative calculation for the cross section of a process taking place at a hard scale Q (see footnote 1 for a comment about the asymptotic nature of QCD series):

1)
result beyond what has been really calculated.

JHEP09(2011)039
where µ R is the renormalisation scale (which we shall in the following simply denote by µ), and the coupling α s (µ) evolves according to A concrete example would of course be the production of hadrons in e + e − collisions. When no dependence is given explicitly, the coefficients and the coupling will be considered to be evaluated at a renormalisation scale µ = Q: Given c n ≡ c n (Q, Q) independent of µ, one can always reinstate the full µ dependence and determine c n (Q, µ) using where c n,0 = c n and (see appendix A for a derivation). Note that this last equation uses all the coefficients c j with j < n. We will also denote by c n (Q, µ)α n s (µ) (2.6) (or, for short, σ k ≡ k n=0 c n α n s for µ = Q) the partial sum up to the last calculated perturbative order k, and by the remainder.

Conventional theoretical uncertainty estimate
The explicit µ dependence of σ k (Q, µ) in eq. (2.6) serves as a reminder that, when truncated to a finite order, a perturbative calculation retains a higher-order dependence on the scale µ. This dependence is generally exploited to estimate its "uncertainty", 2 i.e. the presumed value of ∆ k . In order to do so, one typically quotes an uncertainty interval [σ − k , σ + k ] around σ k (but not necessarily centred on it). The specific choices for σ ± k can vary. Possible options are: 4. Same as eq. (2.10), but with In the last two cases the interval is centred on σ k (Q, Q), whereas in the first two it is not necessarily so. Note also that the choice of varying the scale µ within a factor of two around the physical scale Q, i.e. in the range [Q/2, 2Q], is fully conventional. A priori there is no reason why the interval [σ − k , σ + k ] should represent a sensible estimate of the remainder ∆ k of the series since, from a purely mathematical point of view, δ k (or σ k ) does not contain any information about ∆ k : σ k (Q, µ) and δ k are functions of the c n for n ≤ k, while ∆ k is a function of the c n for n > k. However, the reason why this can instead often work in practice is that, under certain circumstances, the size of δ k can be similar to the size of ∆ k . One can indeed show that (see appendix A) where the factor of 3 in the last term has been obtained by approximating the exact expression 4 ln 2 (This factor would be replaced by 4 ln r if the scale µ were varied in the range [Q/r, rQ]). The last equality above is obtained by making the assumption that all the coefficients in the series share the same magnitude and that α s is reasonably small. Under these same hypotheses (and therefore |c k+1 | |c k |), we can also write (2.14) Experience with perturbative calculations in QCD has shown that theoretical uncertainty estimates like those of eq. (2.10) are quite successful in predicting the range in which a higher order result will fall. This can then be seen as an empirical validation of the assumption made above, i.e. that |c k+1 | is indeed often of the same magnitude as |c k |. The limitation of this conventional approach is that, even if the hypothesis |c k+1 | |c k | is correct and therefore δ k correctly describes the size of the remainder of the series, there is no way of deciding how reliably it may do so.

Credibility-based theoretical uncertainty estimate
In this paper we use the "Bayesian probability" (also called "subjective probability" or "degree of belief" or "credibility", see e.g. [10]), and distinguish it from the "frequentist probability". The two concepts share the same mathematical formalism, but are nonetheless distinct. Bayesian probability is not linked to an infinite number of realizations of an experiment. It deals with a particular question, which may or may not be about the result of one particular realization of a given experiment, and the consequences of the information one considers about its possible answer. 3 This information is not necessarily rigorous or "true" in any way, but its treatment, once translated mathematically into the so called "priors" and "likelihoods", is. A distribution of frequentist probability (or, for instance, its variance) gives a measure of the reproducibility of an experiment. Conversely, a credibility distribution conveys information about the uncertainty of the answer to a question, for instance the result of one particular realization of an experiment, prior to its execution. The variables appearing in a frequentist probability distribution are commonly denoted as random variables, since they take different values in different realizations of the experiment. We call instead uncertain variables the ones in a credibility distribution, to better make the distinction with the former ones: their values are not random (each of them being a single number), but simply unknown.
Given a density function f , the degree of belief (or "credibility") that the value of an uncertain variable η belongs to the interval [a, b] is then equal to where the result is a number between zero and one. 4 In this paper we will always work within the concept of degree of belief as defined above, and will never use the frequentist probability. The latter is not applicable to the case of a theoretical uncertainty, which is not amenable to a frequentist treatment (there is nothing one can "repeat") and is much more akin to a systematic uncertainty instead.

The model
The goal of this paper is to establish a conditional density f (∆ k |c 0 , . . . , c k ) for the value of remainder of the series ∆ k in eq. (2.7), given the knowledge of the coefficients of the perturbative expansion up to order k, and study its behavior. The reason for introducing a 3 One may build his initial credibility distribution using information of frequentist origin: e.g. after throwing an unbiased six-sided dice a large number of times (and hence establishing a frequentist probability), one can come to believe (i.e. define a credibility measure) that there is a one-in-six chance that a given number will show up in a subsequent throw (i.e. set the credibility measure equal to the frequentist probability previously established). However, information of non-frequentist origin can also be included in a credibility-based approach: if someone is told that the dice is likely crooked, they can then adjust their expectations (degree of belief) using this information, even before throwing the dice a single time. 4 Note that, while this may seem similar to the "confidence level intervals" of frequentist statistical analyses, our intervals are to be understood strictly within a Bayesian framework (where they can also be called "credible intervals" at a given level), and should not be confused with the frequentist ones, which (see e.g. [11]) do not in fact express a "level of confidence".

JHEP09(2011)039
density function is that it contains much more information than a simple uncertainty band like the one established in eq. (2.10) in section 2.1. To achieve this we will create a generic credibility measure, applicable to any possible perturbative series, over the space of a priori unknown coefficients c 0 , c 1 , . . . . More precisely, we will create a density function f (c 0 , c 1 , . . . ), normalised such that f (c 0 , c 1 , . . . ) dc 0 dc 1 · · · = 1 (2.16) and whose parameters can be marginalised according to In the case of one particular physical process, some coefficients will have been already computed up to order k: c true 0 , . . . , c true k . The credibility measure for this particular process will be the inherited measure defined on the subspace corresponding to c 0 = c true 0 , c 1 = c true 1 , . . . , c k = c true k . For brevity, we will use the notation f (c k+1 |c 0 , . . . , c k ) instead of f (c k+1 |c 0 = c true 0 , . . . , c k = c true k ). The density over the still unknown c k+1 coefficient will then read, according to the standard conditional density rule, To create this generic measure, we focus on the observation made at the end of section 2.1 (i.e. that the empirical success of conventional theoretical uncertainty estimates made using scale variations can be explained by the fact that successive perturbative coefficients have similar magnitudes) and reverse it. We make the assumption that all the coefficients c n in a perturbative series share some sort of upper boundc > 0 to their absolute values, specific to the physical process studied. 5 The calculated coefficients will give an estimate of thisc, restricting the possible values for the unknown c n . The set of uncertain variables that define the space on which we will create our credibility measure is thus the set constituted of this parameterc and of all the a priori unknown coefficients.
The model rests on three hypotheses:

Residual uncertainty
We suppose that, if we happened to know beforehand the parameterc, our residual density for the value of an unknown coefficient c n would be in the form of a uniform distribution,

JHEP09(2011)039
where χ A is the characteristic function of a set A. We could (and probably should) use a density function that does not vanish anywhere, like a Gaussian distribution, but the form (2.20) leads to simpler expressions, so we use it in the following to study the model analytically. 6

Shared information and independence
The parameterc models information that we consider to be shared by all coefficients, and we make it the only one. Whenc is known, the residual uncertainties on the values of two coefficients c n and c n are then totally independent. In fact, we will suppose the coefficients to be mutually independent, so that for a set of coefficients The value ofc is the maximal information that the coefficients share. It corresponds to the maximal knowledge one could extract from the known coefficients c 0 , . . . , c k in order to "predict" the possible values of unknown ones c n , n > k.

Hidden parameter
The value of thec parameter is "hidden" in the knowledge of the c n . As long as we have not calculated any coefficient, we can only say that it is a positive real number, and that all values for its order of magnitude are a priori equally probable. In order to implement this in practice we define a density for its logarithm as the limit of a uniform distribution between | ln | and −| ln | when a small parameter tends to zero: We will perform calculations (both analytical and numerical) using this -dependent density f with = 0, and the final result will then be the limit → 0. The vanishing of a density in this limit would mean that we do not have enough information to make any guess about the result. For example, f (lnc) tends to a "uniformly null" density, meaning that that when no coefficients are known we have no information whatsoever about the possible value of lnc.
The three hypotheses (2.20), (2.21) and (2.22) define completely the credibility measure over the whole space of a priori uncertain variables {c, c 0 , c 1 , . . . }. They then define every possible inherited measure on a subspace associated with a physical process whose first coefficients are known. Section 6 will revisit the choices made in building this model, for instance the choice of a uniform distribution for lnc rather than forc (eq. (2.22)), and study some alternatives and their consequences on the results.
The following subsections are dedicated to deriving from these hypotheses the densities f (c|c 0 , . . . , c k ), f (c n |c 0 , . . . , c k ) for n > k, and finally the residual theoretical uncertainty of a perturbative prediction calculated up to order k, f (∆ k |c 0 , . . . , c k ). (2.24) Let us derive for instance the second of these results. The conditional density for a generic (uncalculated) coefficient c n , n > k, is by definition (see eq. As stated in the previous subsection, we perform all calculations with = 0 and we take the → 0 limit at the end. From eq. (2.17) and the property of conditional densities, similar to eq. (2.19), we have Using the factorisation property (2.21) and the definitions (2.20) and (2.22) we get (2.28) We therefore can write, using eq. (2.25), Note that in this equation the value k+1 represents the total number of known perturbative coefficients c 0 , . . . , c k used to estimate c n with n > k, rather than simply one unit above the last calculated perturbative order k. Similarly, k + 2 is this total number plus one. If a series starts at a non-zero order α l s , its last known perturbative order k plus one will not give anymore the number of known coefficients. We will detail in section 4 the modifications to be made in this and in the following equations to account for such a case.
The derivation of eq. (2.29) is given in some more detail in appendix B.1. The full derivation of eq. (2.23) is given in appendix B.2. From now on we will collect the derivations of the densities and uncertainty intervals in appendix B, since we do not wish to focus on the technicalities of the derivation of the conditional densities but rather on their behavior. Definingc (k) ≡ max(|c 0 |, . . . , |c k |) we can rewrite the densities (2.23) and (2.24) as .
(2.31) Figure 1 shows the two distributions f (c n |c 0 , . . . , c k ) and f (c|c 0 , . . . , c k ), plotted as functions of c n andc respectively, for different values of k, assuming thatc (k) remains equal to one.c (k) acts as an estimate ofc, and the density f (c|c 0 , . . . , c k ) can be seen to tend to a Dirac distribution concentrated atc =c (k) when k goes to infinity. The more coefficients are known, the more preciselyc is estimated. In the same k → ∞ limit the density over the unknown c n tends to f (c n |c =c (k) ) as given in eq. (2.20), the distribution that corresponds by construction to the remaining uncertainty when the whole of the hidden information simulated byc is known. For a finite value of k, the density is always wider than this limit: the uncertainty about unknown coefficients c n is larger when one knows the values of only a few coefficients c 0 , . . . , c k than when one posses the full information about the value ofc.
The remainder ∆ k of the perturbative series depends on the values of all the unknown coefficients c k+1 , c k+2 , . . . . Its density can be written as (2.32) This expression is too complicated to be handled analytically, even in the case of the simple choice of density in eq. (2.20) for the coefficients. However, making the approximation (2.33) and using eq. (2.31) for f (c n |c 0 , . . . , c k ) with n = k + 1, we obtain This result depends on the entire set of the calculated coefficients via the parameterc (k) = max(|c 0 |, . . . , |c k |).
The knowledge of f (∆ k |c 0 , . . . , c k ) allows one to calculate the smallest p%-credible interval for ∆ k . It turns out to be centred at zero, and hence we denote it by [−d and one finds, using the analytical approximation in eq. (2.34) (see appendix B.3) where, of course, p% ≡ p/100 and p is a number between 0 and 100.
The result for f (∆ k |c 0 , . . . , c k ) in eq. (2.34) can be generalised to any choice of f (c n |c) and f (c), i.e. beyond the choices of eqs. (2.20) and (2.22). Using the derivation given in appendix B.4 we obtain, still within the approximation of eq. (2.33), 37) where we have now explicitly allowed for the possibility of expressing intermediate quantities as a function of (eq. (2.32) was instead already written in the → 0 limit). This expression will be used for the numerical evaluations of densities and credible intervals proposed in the Mathematica package available from the authors.
Since the results in eqs.   Figure 2 shows the numerical results for k = 0, 1 and 2 and the corresponding analytical approximation for f (∆ k |c 0 , . . . , c k ) in eq. (2.34). We can see that the agreement is extremely good, especially when small (realistic) value of α s are used. We will therefore rely on the approximation of equation (2.33) for our predictions of densities for ∆ k in the rest of this paper.

Comparison with the conventional method
In deriving the density for ∆ k in the previous section we made no reference to the scale variation δ k of the partial sum σ k (Q, µ) which is usually employed in the conventional uncertainty estimate [σ − k , σ + k ] of section 2.1. In order to assess the compatibility of the two methods, we now wish to study the relation between the density for ∆ k and an interval of the kind [σ − k , σ + k ]. Given a specific series and a set of coefficients (c 0 , . . . , c k ) we wish to evaluate is the degree of belief of the scale variation interval, forc = 1, for k = 1 (left) and k = 2 (middle) and k = 3 (right). Each plot is obtained using N = 10 4 samples. and, for definiteness, we now take [σ − k , σ + k ] as the interval given by eq. (2.8), so that we can set Since the shape of σ k (Q, µ), and therefore the values of ∆ − k and ∆ + k , depend on all the values of the calculated coefficients (c 0 , . . . , c k ), while the density function f (∆ k |c 0 , . . . , c k ) depends only on their maximumc (k) (see eq. (2.34)), it is important to make sure that different sets of coefficients, sharing the samec, do not typically lead to broadly different estimates for the degree of belief C(∆ k ∈ [∆ − k , ∆ + k ]|c 0 , . . . , c k ) in eq. (3.1). To test this in practice, we evaluate the integral (3.1) for many different configurations of the coefficients (c 0 , . . . , c k ), in the casec = 1. Figure 3 shows the distribution of the values of the degrees of belief that are obtained for the three perturbative orders k = 1, 2, 3. The typical degree of belief C(∆ k ∈ [∆ − k , ∆ + k ]|c 0 , . . . , c k ) predicted by the model can be seen to be largely unaffected by the precise values of the coefficients, its distribution taking an almost Dirac-delta shape at the values 0.57, 0.96 and 0.99 for k = 1, 2, 3 respectively. The peak at x 3 0 in the rightmost plot can be understood as an artifact due to configurations where σ 3 (Q, 2Q) and σ 3 (Q, Q/2) are accidentally close to each other, resulting in a vanishing [∆ − k , ∆ + k ] interval. It can be made to disappear by modifying the choice of the interval and using instead [ i.e. corresponding to eq. (2.9) rather than eq. (2.8).
The numerical results of figure 3 can be understood through the following analytical approximations. First, we modify slightly the interval [∆ − k , ∆ + k ] considered above. We make it symmetric around σ k and of width δ k /2, where δ k is given in eq. (2.13), and we consider

JHEP09(2011)039
Using eq. (2.34), we get (see appendix B.5) (3.5) This result is fully independent of the coefficients in the approximation (or in the case) where c k =c (k) . It predicts the Dirac distribution-like shape observed in figure 3 and the variation of its position with the value of k. For k = 1, k = 2 and k = 3, it gives for the degree of belief (3.5) the values 61% (using the lower expression in eq. (3.5)), 96% and 99.6% (using the upper expression in eq. (3.5)) respectively, using β 0 = 0.61 and c k =c (k) . These values are in good agreement with those obtained from the numerical estimates of the exact densities in figure 3. The k-dependence of the result in eq. (3.5) shows that the degree of belief of the interval [σ k − δ k /2, σ k + δ k /2] is not a constant, but depends instead on the perturbative order at which we are working. When calculating higher orders in a perturbative series not only the size of the conventional residual uncertainty decreases, but also its degree of belief, as evaluated by our new method, increases. Note also that this method of evaluating the degree of belief of an uncertainty interval avoids one specific shortfall of the conventional method, namely that its estimate of the theoretical uncertainty may become unreasonably small if the last calculated coefficient happens by accident to be much smaller than the others or if accidental cancellations take place.
It would be tempting to consider eq. (3.5) as the main results from our Bayesian model, allowing one to associate a degree of belief to the uncertainty bands given by the conventional method. The simplicity of these equations, and their numerical values tantalisingly (though entirely accidentally) close to the confidence levels of Gaussian sigmas, make them apparently good candidates for such an identification. However, it is important to bear in mind that these equations depend on the choice made for the density function in eq. (2.20), as well as on the various approximations made in deriving them. As such, they cannot be considered as strictly valid in general, although they offer a very useful first approximation when trying to gauge the degree of belief of a conventional uncertainty band generated by scale variations.
In practice, one would like to be able to abandon the scale variations method altogether, and determine the degree of belief of any interval of his choosing. In general we will therefore not use eq. (3.5), but rather estimate any desired p%-credible interval numerically using the density function (2.37), without any reference to the conventional method. When this is the case, only k + 1 − l coefficients (rather than k + 1) are calculated when the series is known up to perturbative order k. The results of our model given in the previous section should then be modified as follows.

Eqs. (2.23) and (2.30) become
f (c|c l , . . . , c k ) = n c (max(|c l |, . . . , |c k |)) nc c nc+1 χc >(max(|c l |,...,|c k |)) = n cc where we have introduced the number of known coefficients, Note also thatc (k) should now formally be defined as max(|c l |, . . . , |c k |). We have not changed its notation so as not to proliferate the number of different symbols. Eqs. (2.24) and (2.31) are similarly modified to f (c n |c l , . . . , c k ) = 1 2 n c n c + 1 (max(|c l |, . . . , |c k |)) nc (max(|c n |, |c l |, . . . , |c k |)) nc+1 = n c n c + 1 (4.5) and from this one derives the result corresponding to eq. (2.36) for the width of the smallest p%-credibility interval: Finally, using the result in eq. (2.13), δ k 3kβ 0 α k+1 s |c k |, which is unmodified by the fact that the series starts now at order l, one finds that the degree of belief associated to the interval given by the conventional scale-variation method, already given in eq. (3.5) for the l = 0 case, is modified as (4.7) For a process starting at order α s (i.e. l = 1) this equation predicts a degree of belief of 46% at LO (k = 1), using the lower expression in eq. (4.7), 90% at NLO (k = 2) and 98.8% at NNLO (k = 3), using in both cases the lower expression in eq. (4.7) and c k =c (k) . For a process starting at order α 2 s (i.e. l = 2) one predicts instead a degree of belief of 73% at LO (k = 2), 96% at NLO (k = 3) and 99.5% at NNLO (k = 4). In this case the upper expression in eq. (4.7) always applies.

A realistic application: e + e − → hadrons
The total cross section σ(e + e − → γ → hadrons) is one of the best known observables in perturbative QCD, its coefficients being known exactly up to order α 3 s , and even c 4 being known approximately. This process is therefore an ideal place where to test the behaviour of our Bayesian model, and compare it to the results of the conventional uncertainty estimate.
We write this cross section as One can compare these "uncertainty intervals" with the successive perturbative results given above, and see that indeed order by order the higher-order result is inside the interval. 8 This is as far as the conventional uncertainty estimate can go. At this point one can use our model to do one of two things (or both): either one calculates the degree of belief of the intervals given above, or one finds the intervals corresponding to given values of degree of belief, for instance 68.3%, 95.5% and 99.7%.
For the first option, we find Note that in [13] σQCD is denoted by δQCD instead. We have modified the notation to avoid confusion with our own definition for δ k , which represents an uncertainty interval rather than a value of the truncated series. 8 These uncertainty intervals have been evaluated by using in all cases an evolution equation for αs up to β2, i.e. what is needed for σQCD,3. One may have used lower-order accuracies when dealing with σQCD,1 or σQCD,2, but we have explicitly checked that this changes at most the last significant figure in the numbers given above. These values have been obtained via numerical integration of the density f (∆ k |c 1 , . . . , c k ) in eq. (4.5). One can compare them to the values given by the analytical approximations in eq. (4.7) for l = 1 and k = 1, 2, 3 respectively, i.e. n c = k + 1 − l = 1, 2, 3. One obtains 45.8%, 54.8% and 98.8% respectively. The first two values (both obtained with the lower expression in eq. (4.7)) can be seen to be in good agreement with the exact results. The big discrepancy for the third one can be explained with the fact that the actual interval [σ − QCD,3 , σ + QCD,3 ] happens to be quite asymmetric with respect to the central value σ QCD,3 , whereas in the approximation (4.7) the interval [−δ k /2, δ k /2] is symmetric. This serves as a remainder that one should always resort to the full numerical evaluation of the degree of belief whenever accurate results are sought.
For the second option, we find, using the notation C and f (c). We made there the choice of using a flat prior for lnc (rather thanc itself) in eq. (2.22), and for c n instead in eq. (2.20). We discuss below the reasoning behind these choices.

Choice of the density function f (c n |c)
The choice of exactly what variable to use to express a prior density, e.g. the logarithm of a parameter rather than the parameter itself, is related to the assumed nature of said parameter. Suppose we used ln c n instead of c n in eq. (2.20) and defined a uniform density f (ln c n |c, h) = 1 2h χ lnc−h≤ln cn≤lnc+h (6.1) where h is an arbitrary parameter. This means to consider it as likely to find c n between c/ exp(h) andc as betweenc andc exp(h). One can debate whether this behaviour is more or less appropriate than the one used in eq. (2.20) where c n is equally likely to lie between −c and zero as between zero andc. However, the main drawback of eq. (6.1) is that it requires the introduction of a new, a priori unknown, parameter h which controls the spread of the coefficients. At least three perturbative coefficients would then need to be known before the model can estimate a credibility interval.
We have therefore concluded that the hypothesis in eq. (2.20) not only already describes sufficiently well the observed typical relations between perturbative coefficients, but also JHEP09(2011)039 provides the simplest model (simplicity being a strong guiding principle of our model, as we wish to be able to control well the information we introduce into it).

Choice of the density function f (lnc)
The value of the sum of a perturbative series depends on the value ofc. Choosing a density forc which is uniform inc itself rather than in its logarithm amounts to trying to predict the precise value of such a series rather than just its order of magnitude. We find the former too strong a constraint, and prefer therefore to limit ourselves to the second choice. On a technical side, we also find that when using a prior uniform inc one then needs at least two calculated coefficients in order to have a non-null density on its theoretical uncertainty, whereas in the lnc case one coefficient is already sufficient to give an indication about the order of magnitude of the higher order coefficients and therefore about the remainder of the series.
Note also that it is sufficient to use in eq. (2.22) a density f (lnc) that is uniform in lnc only in the → 0 limit. For finite values of this requirement is not necessary.

Choice of the expansion parameter
Another modification of the model would be to use an expansion parameter that differs from α s , so that σ = c n α n s = (λ n c n ) wherec is a parameter that applies to the new set of coefficients c n . This choice would not modify the expressions for the densities over the unknown coefficients f (c n |c 0 , . . . , c k ), but only the one over the residual sum f (∆ k |c 0 , . . . , c k ) since the approximation (2.33) would now read ∆ k c k+1 α s λ k+1 (6.5) so that eq. (2.34) is replaced by Different values for λ relate to different speeds of convergence of the series, either quicker (λ > 1) or slower (α s < λ < 1). Of course one must be careful not to end up with an expansion parameter which is too large, because this will eventually invalidate the use of the approximation in eq. (2.33).

JHEP09(2011)039 7 Partially known higher orders
The model we have considered so far assumes perfect knowledge of some coefficients, up to order k, and total ignorance of those of higher order. In practice, it is often possible to know part of a higher order coefficient, typically calculated within some approximation or obtained as an expansion of an all-order resummation. It is yet straightforward to extend the model to account for such cases.
Two new building blocks are required to adapt the model. First of all, ifc k+1 is an approximation of c k+1 it should not provide more information than the true value c k+1 itself. If the real value c k+1 is known, knowledge of the approximate valuec k+1 must not change anything: for a set of coefficients {c i } it must hold Secondly, one must decide how reliable a given approximationc k+1 of c k+1 is. We must introduce a density function f (c k+1 |c k+1 ) for the valuec k+1 , given the true c k+1 . The choice of this density will depend on the wayc k+1 was obtained. One possible parametrisation is for instance the log-normal density for some chosen value of the parameter f . It more or less corresponds toc k+1 estimating c k+1 up to a factor of order f . The densities on the true value of the coefficient c k+1 which is known only approximately, and on the completely unknown coefficients c n can then be written, up to normalisation factors collectively denoted by N , as (see appendix B.6) f (c k+1 |c 0 , . . . , c k ,c k+1 ) = N f (c k+1 |c 0 , . . . , c k )f (c k+1 |c k+1 ) (7.5) f (c n |c 0 , . . . , c k ,c k+1 ) = N f (c n , c k+1 |c 0 , . . . , c k )f (c k+1 |c k+1 ) dc k+1 (7.6) More generally, for arbitrary sets of known coefficients C K = {c i } i∈[0,k] , approximations C A = {c i } i∈A of coefficients C A = {c i } i∈A and totally unknown coefficients To get the density for the unknown coefficients, one then just integrates over C A . To get the density over ∆ k one replaces eq. (7.7) in the definition (2.32). Let us study for instance the case of one known coefficient c 0 and one approximationc 1 . We want to obtain the density f (∆ 0 |c 0 ,c 1 ). Depending on how much we trust the approximation, the uncertainty over c 1 or c 2 will predominate. In general, we need to keep track of both these coefficients in the expression for ∆ 0 . We do not use the approximation (2.33) but rather the more accurate one ∆ 0 c 1 α s + c 2 α 2 s . The result for the density is then The expression for f (c 1 + c 2 α s = x|c 0 ,c 1 ) can be obtained as where we defined c 1 (x, c 2 ) ≡ x−α s c 2 to simplify the expression and we used equation (7.7).
The result for f (c 2 , c 1 (x, c 2 )|c 0 ) is obtained in a similar way to f (c n |c 0 , . . . , c k ) in (2.31): Using eq. (2.27) for k = 0 and k = 2 we find where we have definedc (2) In order to see how this works in practice, we choose c 0 = 0.9 andc 1 = 0.83 and we plot eq. (7.12) as a function of x = ∆ 0 /α s for various values of f . Figure 5 shows how the density is modified whenc 1 is more and more trusted (f is smaller and smaller). Consider

JHEP09(2011)039
at first the top left plot. When the approximation is not much trusted and f is large (f = 50 in this case), the only useful information that we can get fromc 1 is the sign of c 1 . The total uncertainty over ∆ 0 /α s c 1 + α s c 2 is largely dominated by the uncertainty over c 1 . Its density coincides with two times the density f (c 1 |c 0 ) for positive values of x and with f (c 1 |c 0 ,c 1 ). The differences, originating from the uncertainty over c 2 or from the small information provided by the particular value ofc 1 , are almost negligible (except around zero).
Asc 1 gets more trusted, the uncertainty decreases and the degree of belief for ∆ 0 /α s to have a value aroundc 1 increases (top middle plots). The uncertainty over c 1 is still the limiting one but the information provided byc 1 is not negligible anymore. The full density still coincides with f (c 1 |c 0 ,c 1 ) but is starting to be different from f (c 1 |c 0 ).
Then comes a limit where the uncertainty over c 1 is of the same order as the one over α s c 2 (top-right and bottom-left plots. They are the same plot, but shown for different scales on the x-axis). The full density over c 1 + α s c 2 now differs from f (c 1 |c 0 ,c 1 ) and its width is given both by this density and by the one over f (c 2 |c 0 , c 1 =c 1 ).
Asc 1 is considered to be a better and better approximation, the uncertainty over c 2 prevails. The difference between c 1 andc 1 is now negligible and the total density approaches the shape of f (c 1 + α s c 2 |c 0 , c 1 =c 1 ) (bottom middle and right plots).

Conclusions and outlook
In this paper we have introduced a Bayesian model which allows one to characterise in terms of intervals of a given degree of belief (or degree of belief of a given interval) the residual theoretical uncertainty of a perturbative calculation. Our aim is to put on more solid ground the estimate of the uncertainty of a known result, not to improve in any way the calculation itself. This we try to achieve by formalising hypotheses on the behaviour of the coefficients of perturbative series, and then by deriving from these hypotheses the degree of belief values in a rigorous way.
We have chosen to try to translate as closely as possible into our model the assumption which is implicitly made when employing the conventional method (scale variations) for estimating the uncertainty, namely that successive coefficients of a perturbative series tend to have similar size. One may or may not believe this hypothesis to be well grounded, and our choice is not necessarily true or even just the best possible one. However, what matters, and what this paper wants to provide, is not so much which hypotheses are made, but rather the formalism that allows one to derive from them a proper characterisation of the residual theoretical uncertainty: our framework can then be considered as a box into which to input one's favourite hypothesis about the behaviour of a series, and from which to extract the appropriate degree of belief values.
We have found that, under the quite general assumption mentioned above and within the Bayesian framework formalised in section 2.2.1, the p%-credible interval [σ k − d k ] of a series calculated up to order k, σ k = c l α l s + · · · + c k α k s , with n c = k + 1 − l the number of known perturbative coefficients. The full credibility distribution can also be obtained, and is given in section 2.2.3.
In the calculation of QCD corrections to a simple process like e + e − → hadrons we see that the intervals given by the conventional renormalisation scale variation are not too dissimilar from the 68.3%-credible intervals given by our Bayesian model. These findings, detailed in section 5 and shown in graphical form in figure 4, are perhaps not surprising: the conventional method itself has been built and refined over the years into a form that often returns results compatible with the calculation of successive perturbative orders and with intuitive expectations, and the same hypothesis that it makes implicitly we have made explicitly. Nevertheless, within our method one can now state a precise interval and in addition a detailed degree of belief for it (and possibly bet on it). A Mathematica package implementing the results of this paper is available from the authors.
Obviously this is not the final word in terms of a rigorous characterisation of theoretical uncertainties. We have chosen a very simple process, and we have found a nice self-consistent picture. However, much more work remains to be done in order to extend the method to more complex processes. For one thing, one may wish to accommodate also the presence of a factorisation scale, and therefore additional ingredients like parton distribution or fragmentation functions. Secondly, our Bayesian model as it is now formulated inevitably fails when the behaviour of a physical process is known not to be self-similar to all orders: an obvious example is a process for which a new production channel opens at some perturbative order, or for which a particular kinematical configuration is selected. In such cases, an extension of our hypotheses is obviously called for. We are looking forward to exploring these new avenues in the future.

JHEP09(2011)039
where β(α s ) is the beta function. µ is an arbitrary renormalisation scale, so that it holds dσ d ln µ 2 = 0 = This equation already tells us that the µ-dependence of the coefficient c i is controlled by the lower-order coefficients c j , j ≤ i − 1. Moreover, it shows that c 0 and c 1 are independent of µ. It is then straightforward to conclude that c i is a polynomial of degree less than or equal to i − 1 in ln µ 2 Q 2 : This is true for i = 1 since the derivative of c 1 with respect to µ is zero. Assuming that it is true up to some i, eq. (A.4) shows that dc i+1 d ln µ 2 is a polynomial in ln µ 2 Q 2 of degree less than or equal to i − 1, which makes c i+1 itself a polynomial of degree less than or equal to i.
Rewriting eq. (A.4) order by order in ln µ 2 Q 2 , we can also obtain a recurrence relation giving the values of all c i,l in terms of the c j,0 = c j (Q, Q), j < i, and the β j : Hence, given the calculated values for c i (Q, Q) one can easily reconstruct the full renormalisation scale dependence of the coefficients and of the partial sums σ k = k i=0 c i α i s . Once again, note that only the coefficient c i (Q, Q) = c i,0 needs to be explicitly computed at each order.
Finally, we give the expression of the derivative of a partial sum σ k with respect to ln showing that, as expected, the residual scale dependence of σ k is of higher order α k+1 s . If we now consider that α s 1, observe that the known coefficients β i are such that β i β 0 (see table 1), and assume all the |c i | are of the same order, we can approximate this result as  Table 1. First values of iβ k−i , calculated with n f = 5. The first column gives the first four coefficients of the beta function.

JHEP09(2011)039
Using the fact that when a coefficient is fully known knowing it approximately adds nothing (see eqs. where with N we denote the normalisation factor. In the limit → 0 we can therefore write f (c k+1 |c 0 , . . . , c k ,c k+1 ) = N f (c k+1 |c 0 , . . . , c k )f (c k+1 |c k+1 ) (B.20) The density f (c n |c 0 , . . . , c k ,c k+1 ), for n > k + 1, is then simply Open Access. This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.