The Simplified Likelihood Framework

We discuss the simplified likelihood framework as a systematic approximation scheme for experimental likelihoods such as those originating from LHC experiments. We develop the simplified likelihood from the Central Limit Theorem keeping the next-to-leading term in the large $N$ expansion to correctly account for asymmetries. Moreover, we present an efficient method to compute the parameters of the simplified likelihood from Monte Carlo simulations. The approach is validated using a realistic LHC-like analysis, and the limits of the approximation are explored. Finally, we discuss how the simplified likelihood data can be conveniently released in the HepData error source format and automatically built from it, making this framework a convenient tool to transmit realistic experimental likelihoods to the community.


Introduction
Scientific observations of the real world are by nature imperfect in the sense that they always contain some amount of uncertainty unrelated to data, the systematic uncertainty. Identifying, measuring and modelling all the sources of systematic uncertainty is an important part of running a scientific experiment. A thorough treatment of such uncertainties is especially important in exploratory fields like particle physics and cosmology. In these fields of research, today's experiments can be of large scale and can contain a huge number of these uncertainties. In the case of the Large Hadron Collider (LHC) experiments, for instance, the experimental likelihood functions used in Standard Model measurements and searches for new physics can contain several thousands of systematic uncertainties.
Although sources of systematic uncertainty can be numerous and of very different nature, a general feature they share is that their most elementary components tend to be independent from each other. This property of independence between the elementary systematic uncertainties has profound consequences, and, as discussed below, is the reason why the approach presented in this work is so effective. Namely, independence of the uncertainties can be used to drastically simplify the experimental likelihood function, for the price of an often-negligible error that will be discussed at length in this paper.
The simplified likelihood (SL) framework we present in this paper is a well-defined approximation scheme for experimental likelihoods. It can be used to ease subsequent numerical treatment like the computation of confidence limits, to allow a uniform statistical treatment of published search-analysis data and to ease the transmission of results between an experiment and the scientific community. We build on the proposals for approximating likelihoods recently suggested in Refs. [1,2], in which promising preliminary results have been shown.
In the context of the LHC, communicating the experimental likelihoods, in their full form or in convenient approximations, was advocated in Refs. [3,4]. One possibility is to communicate the full experimental likelihoods via the RooFit/Roostats software framework [5,6]. The presentation method we propose in this paper is complementary in that it is technically straightforward to carry out, without relying on any particular software package. Additionally, the proposal of presenting LHC results decoupled from systematic uncertainties has been pursued in Ref. [7] in the context of theoretical errors on Higgs cross-sections. For Higgs cross-sections and decays, the combined covariance of the Higgs theoretical uncertainties consistent with the SL framework presented here has been determined in Ref. [8].
In this paper we unify and extend the initial proposals of Refs. [1,2], and thoroughly test the accuracy of the approximations using simulated LHC searches for new phenomena. Compared to Refs. [1,2], an important refinement is that we provide a way to rigorously include asymmetries in the combined uncertainties, which is useful in order to avoid inconsistencies such as a negative event yield. Technically this is done by taking into account the next-to-leading term in the limit given by an appropriate version of the Central Limit Theorem (CLT).
The paper is organised as follows. Section 2 introduces the formalism and key points of our approach. The formal material, including an in-depth discussion of the next-to-leading term of the CLT and the derivation of the SL formula, is presented in Section 3. Practical considerations regarding the SL flexibility and the release of the SL via HepData are given in Section 4. Finally a validation of the SL framework in a realistic pseudo-search at the LHC is presented in Section 5. Section 6 contains our conclusions. Two appendices give some more useful details: Appendix A contains a 1D example of how the skew appears in the asymptotic distribution, and Appendix B presents a reference implementation of the SL written in Python.

From the experimental likelihood to the simplified likelihood
This section introduces the formalism, presents the main theoretical results and an efficient Monte-Carlo based calculation method. We will focus on the typical experimental likelihood used in searches for new phenomena at particle physics experiments. However, we stress that the SL approach can be easily generalised to other physics contexts. The data collected in particle physics usually originate from random (quantum) processes, and have thus an intrinsic statistical uncertainty-which vanishes in the limit of large data sets. Our interest rather lies in the systematic uncertainties, which are independent of the amount of data.
A likelihood function L is related to the probability Pr to observe the data given a model M, specified by some parameters, L(parameters) = Pr(data|M, parameters) . (2.1) We denote the observed quantity as n obs and the expected quantity by n, where n depends on the model parameters. For example, in the case of a particle physics experiment, these quantities can be the observed and expected number of events that satisfy some selection criteria. The full set of parameters includes parameters of interest, here collectively denoted by α, and elementary nuisance parameters δ = (δ 1 , . . . , δ j . . . , δ N ) T , which model the systematic uncertainties. In the SL framework, we derive a set of combined nuisance parameters θ. For P independent measurements, there will be P combined nuisance parameters, θ = (θ 1 , . . . , θ I , . . . , θ P ) T .
The key result at the basis of the SL framework is the approximation Pr n obs Pr n obs where the first line is the exact "experimental likelihood" and the second line is the SL. Here π(δ) is the joint probability density distribution for the elementary nuisance parameters. In our assumptions these are independent from each other, hence the prior factorises as The SL formalism is relevant for N > P , which is also the most common case. 1 The coefficients a I , b I and c I , and the P × P correlation matrix ρ = ρ IJ define the SL and are in general functions of the parameters of interest. However, in concrete cases, this dependence will often be negligible. This is in particular the case in particle physics searches for new physics when the expected event number decomposes into signal (n s ) plus background (n b ) contributions. The parameters of interest that model the new physics enter in n s while n b is independent of them. Whenever the expected signal is small with respect to the background, the dominant uncertainties in searches for new physics are those related to the background. Neglecting the systematic uncertainties affecting the signal implies in turn that the parameters of the SL are independent of α. Hence the SL Eq. (2.3) takes the form 2 which is the expression we use in the rest of this paper. This expression is valid for data with any statistics of observation. However, since the data in particle physics are often observed event counts, n obs I , the data will typically follow Poisson statistics such that Pr(n obs I |n I ) ≡ Pois(n obs I |n I ) = (n I ) n obs I e −n I n obs I ! . (2.5) The parameters of the SL (a I , b I , c I , ρ IJ ) have analytical expressions as a function of the variance and the skew of each elementary nuisance parameter (see Section 3.2). However, often the elementary uncertainties and the event yields are already coded in a Monte Carlo (MC) generator. In this case, an elegant method to obtain the SL parameters is the following. From the estimators of the event yieldsn I , one can evaluate the three first moments of then I distribution and deduce the parameters of the SL directly from these moments. What is needed is the mean m 1,I , the covariance matrix m 2,IJ and the diagonal component of the third moment m 3,I ≡ m 3,III .
Using the definition n I = a I + b I θ I + c I θ 2 I , we have the relations where E denotes the expectation value. Inverting these relations, while taking care to pick the relevant solutions to quadratic and cubic equations, gives the parameters of the SL. We find These formulae apply if the condition 8m 3 2,II ≥ m 2 3,I is satisfied. Near this limit, the asymmetry becomes large and the approximation inaccurate because higher order terms O(θ 3 I ) would need to be included in Eq. (2.3). In practice, however, this requires a high skewness of the nuisance parameters, and the SL framework up to quadratic order is sufficient for most applications.
This method will be used in the examples shown in the rest of the paper. This means that if one is provided with the moments m 1 and m 3 for each bin and the covariance matrix m 2,IJ , the SL parameters are completely defined. Moreover, in the case where the nuisance parameters affect only the background rate Eq. (2.4), this computation has to be realised only once and the resulting likelihood can be used for any kind of signal by appropriate substitution of n s (α).
A reference code implementing the SL and subsequent test statistics is described in Appendix B and publicly available at https://gitlab.cern.ch/SimplifiedLikelihood/ SLtools.

The simplified likelihood from the central limit theorem
This section contains the derivation of the SL formula Eq. (2.3). The reader interested only in the practical aspects of the SL framework can safely skip it. In Section 3.1 we lay down a result about the next-to-leading term of the CLT. In Section 3.2 we then demonstrate Eq. (2.3) and give the analytical expressions of the SL parameters as a function of the elementary uncertainties. The precision of the expansion is discussed in Section 3.3.

Asymmetries and CLT at next-to-leading order
The CLT is often used in its asymptotic limit where the distribution becomes exactly normal. In the context of the SL framework, however, it is mandatory to keep the next-toleading term in the CLT's large N expansion. This next-to-leading term encodes skewness: this is the main information about the asymmetry of the distribution. This asymmetry is a relevant feature for the analyses hence it is in principle safer to keep this information. However, keeping the asymmetry is truly critical for a slightly different reason discussed below. A normal distribution has a support on R, while quantities like event yields are defined on R + . Having a nuisance parameter with a normal distribution can, therefore, give inconsistencies such as a negative yield. This is a problem both conceptually and concretely, when marginalising or profiling the likelihood. This issue occurs because the Gaussian limit of the CLT loses information about the support. Using this limit is simply too rough an approximation. Instead, to have an asymmetric support such as R + , the distribution must be asymmetric, therefore the skew must be taken into account.
The deformed Gaussian obtained when keeping the skew into account does not seem to have in general an analytical PDF. However, by using the large N expansion, we are able to express the CLT at next-to-leading order in a very simple way. We realise that a random variable Z with characteristic function can, up to higher order terms in the large N expansion, be equivalently be expressed in terms of an exactly Gaussian variable θ in the form We will refer to this type of expression as "normal expansion". Details about its derivation are given in Appendix A. Equation (3.2) readily gives the most basic CLT at next-to-leading order when assuming Z = N −1/2 N j=1 δ j , where the δ j are independent identically distributed centred nuisance parameters of variance σ 2 and third moment γ. The method works similarly with the Lyapunov CLT, i.e. when the δ j are not identical and have different moments σ 2 j , γ j , in which case one has defined σ 2 = N −1 N j=1 σ 2 j , γ = N −1 N j=1 γ j , Finally, our approach applies similarly to the multidimensional case where various linear combinations of the δ j give rise to various Z I . The Z I have a covariance matrix Σ IJ and a skewness tensor γ IJK = E[Z I Z J Z K ]. For our purposes, we neglect the non-diagonal elements of γ, keeping only the diagonal elements, denoted γ III ≡ γ I . These diagonal elements encode the leading information about asymmetry, while the non-diagonal ones contain subleading information about asymmetry and correlations. With this approximation, we obtain the multidimensional CLT at next-to-leading order, This result will be used in the following. Again, for γ I → 0, one recovers the standard multivariate CLT.

Calculation of the simplified likelihood
Let us now prove Eq. (2.3). The dependence on the parameters of interest α is left implicit in this section. We will first perform a step of propagation of the uncertainties, then a step of combination. This is a generalisation of the approach of [1]. Here we take into account the skew, hence there is no need to use an exponential parameterisation like in [1]. In this section the elementary nuisance parameters δ i are independent, centered, have unit variance, and have skew γ i , i.e.
It is convenient to use a vector notation for the set of these elementary nuisance parameters, As a first step, we want to propagate the systematic uncertainties at the level of the event numbers. For an event number n depending on a quantity Q subject to uncertainty, we have The propagation amounts to performing a Taylor expansion with respect to ∆ Q . This expansion should be truncated appropriately to retain the leading effects of the systematic uncertainties in the likelihood. It was shown in [1] that the expansion should be truncated above second order. For multiple sources of uncertainty, we have a vector δ and the relative uncertainties propagated to n are written as and the n (3) denoting schematically the third derivatives of n. The second step is to combine the elementary nuisance parameters. We introduce combined nuisance parameters θ I which are chosen to be centred and with unit variance without loss of generality, and whose correlation matrix is denoted ρ IJ ,i.e.
Moreover we define the expected event number in terms of the combined nuisance parameters as n I = n 0 The a I , b I , c I parameters together with the correlation matrix ρ IJ fully describe the combined effect of the elementary uncertainties. To determine them we shall identify the three first moments on each side of Eq. (3.9). We obtain where the O(∆ 4 ) denotes higher order terms like tr(∆ T 2,I ·∆ 2,I ), (tr ∆ 2,I ) 2 , ∆ T 1,I ·∆ 1,I tr ∆ 2,I which are neglected. When γ i → 0 one recovers the expressions obtained in Ref. [1]. 3 Importantly, the ∆ 2 term contributes at leading order only in the mean value a I and always gives subleading contributions to higher moments. Hence, for considerations on higher moments, which define the shape of the combined distribution, we can safely take the approximation n I ≈ n 0 I 1 + ∆ 1,I · δ (3.14) from Eq. (3.9). We now make the key observation that this quantity is the sum of a large number of independent random variables. These are exactly the conditions for a central limit theorem to apply. As all the elementary uncertainties have in principle different shape and magnitudes we apply Lyapunov's CLT [9]. We can for instance use Lyapunov's condition on the third moment, and the theorem reads as follows. If Furthermore we can see that the expression of n I in terms of the combined nuisance parameters, n I = a I + b I θ I + c I θ 2 I (first defined in Eq. (3.9)), takes the form of a normal expansion, see Eq. (3.3). This means that the c I θ 2 I term corresponds precisely to the leading deformation described by the next-to-leading term of the CLT. This deformation encodes the skewness induced by the asymmetric elementary uncertainties. We have therefore obtained a description of the main collective effects of asymmetric elementary uncertainties, which is dictated by the CLT. The resulting simplified likelihood is given in Eq. (2.3).

Precision of the normal expansion
The accuracy of the normal expansion n = a + bθ + cθ 2 with θ ∼ N (0, 1) -and thus of the simplified likelihood -is expected to drop when only a few elementary uncertainties are present and these depart substantially from the Gaussian shape. This is the situation in which the next-to-leading CLT, Eq. (3.3), tends to fail. It is instructive to check on a simple case how the normal expansion approximates the true distribution, and in which way discrepancies tend to appear.
We consider the realistic case of a log-normal distribution with parameters µ, σ. We fix µ = 0 without loss of generality. The three first centered moments are and a, b, c are obtained using Eqs. (2.9)-(2.12). For σ ∼ 0.69, the bound 8m 3 2 ≈ m 2 3 is reached (see Section 2). This is the limit where the distribution is so asymmetric that the variance comes entirely from the θ 2 term. Beyond this bound the normal expansion cannot be used at all as Eqs. (2.9)-(2.12) have no solutions. The distribution has c > 0 thus n has a lower bound given by n > a − b 2 /4c. Below this limit on σ, the lower bound on n is roughly n 0.5, therefore the approximation can never produce a negative event yield.
To check numerically how well the approximation performs, the true and approximate PDFs are compared in Figure 1 for various values of σ. Since the approximate PDF never gives n < 0.5, it can only be a good approximation if the true PDF is vanishing in this region. This is the case for asymmetries σ 0.3, and as can be seen in the figure the normal approximation indeed works very well. For larger asymmetries, σ = 0.45 in our example, the true PDF becomes sizeable in the region n < 0.5. The approximation still performs reasonably well for larger n, however, near n ∼ 0.5, the approximate PDF tends to increase and become peaked to account for the area at n < 0.5 that it cannot reproduce. This behaviour will also be observed for certain bins in the LHC-like analysis implemented in Sec. 5.
Overall, through this example, we can see that the normal approximation tends to become inaccurate for a skewness of ∼ 100-150%. This is a moderate value, however one should keep in mind that these considerations apply to the combined uncertainties, for which small skewness is typical. The accuracy of the SL framework will be tested in a realistic setup in Sec. 5.

Range of application
An important feature of the SL is that it is flexible in the sense that the combination of the systematic uncertainties does not have to be applied to the whole set. The only requirement to combine a subset of the uncertainties is that it should have a convergent enough CLT behaviour in order for the SL to be accurate. There is thus a freedom in partitioning the set of systematic uncertainties, giving rise to variants of the SL that can be either equivalent or slightly different upon marginalising.
For instance, if a single systematic uncertainty δ is left apart from the combination, the SL takes the form Similarly, if two subsets of systematic uncertainties θ andθ tend to separately satisfy the CLT condition, they can be separately combined, giving (4. 2) The SL naturally accommodates any such partitions. It is actually commonplace in LHC analyses to present systematic uncertainties combined in subsets, for example "theoretical", "experimental", "luminosity", "MC" uncertainties. This is useful not only for informative purpose but also for further interpretations. For example the theoretical uncertainties may be improved later on and it is clearly of advantage if their effect can be re-evaluated without having to re-analyse the whole data (which could only be done by collaboration insiders). 4 Another reason to single out a nuisance parameter from the combination (as shown in Eq. (4.1)) is if it has a large non-Gaussian PDF that one prefers to take into account exactly. In order to profit from the versatility of the SL, an equally versatile format is needed to release the SL data. This will be the topic of next subsection.
Finally, if systematic uncertainties on the signal are taken into account, the b, c, ρ parameters become dependent on the parameter of interest α. While there is no conceptual difference with the case of background-only systematics, there are important practical differences. Numerical evaluations become heavier since the parameters of the SL-especially ρ IJ (α) which requires a matrix inversion-have to be evaluated for each value of α. Furthermore, the presentation of the SL data may also become more evolved.

Construction and presentation
There are in principle two ways of releasing the data needed to build the simplified likelihood. One way is to release the whole set of elementary systematic uncertainties, the other to release the three first moments of the PDF of the combined systematic uncertainties. While the former is in principle doable, we will focus only on the latter. Indeed, the elementary uncertainties are usually already coded by the experimentalists in MC generators, hence it is straightforward to evaluate these moments. 5 We thus focus on the release of the SL data via the m 1,I , m 2,IJ , m 3,I moments of the PDF of the combined systematic uncertainties, already defined in Eqs. (2.6)-(2.8), where 4 Such combination of theoretical uncertainties has been done in [8] for the Higgs production and decay rates and can be implemented in a Higgs SL. 5 Using the elementary uncertainties maybe more convenient when one wishes to include the systematic uncertainties on the signal, i.e. α-dependent b, c, ρ. Since these systematics are not crucial for new physics searches we do not take them into account here. m 3,I is the diagonal part of the third-rank tensor m 3,IJK . Evaluating these moments via MC toys is straightforward for the experimental analysis. However, their way of presentation needs to be considered in detail, taking into account the available tools and the current practices. This is the purpose of this subsection. Key to the usefulness of any likelihood data for analysis reinterpretation is the availability of that data in a standard format. For global fits, where tens or hundreds of analyses may be used simultaneously, it is crucial that this format be unambiguously parseable by algorithms without human assistance. A standard location is also necessary, for which the obvious choice is the longstanding HEP data repository, HepData [10].
It is convenient to refer to the data in terms of the order of the moment from which they originate. We will use the term "n-th order data" to refer to information coming from a moment of order n; here, n will go only up to 3. Second-order data includes the covariance matrix, correlation matrix, and/or diagonal uncertainties: these can be given either in a relative or absolute parametrisation. There is the same kind of freedom for third-order data but this does not need to be discussed here. In addition to the moments of the combined systematic uncertainties, this terminology will also apply to the observed central values and statistical uncertainties usually presented by the experiments.
Let us review the current formats of presentation of likelihood data. The presentation of first-order data is standardised while currently no third-order data are usually given. Regarding second-order data there is unfortunately no standard representation currently established. A review of the second-order data in HepData and on the experiments' analysis websites reveals a mixture of presentation styles: • Table format: 2D histograms of either covariance or correlation matrices. This has the difficulty that the convention used is not made clear (other than by inspection of the matrix diagonal), and without a structural association with a first order dataset it is impossible for computer codes to unambiguously construct the relevant likelihood.
In the case of the presentation of a correlation (as opposed to covariance) matrix, the diagonal variances must be provided with the first-order dataset.
• Error source format: A vector of labeled ± terms associated to each element of the first-order dataset. The correlations between the error sources is indicated via the labels, (e.g., a "stat" label to be a purely diagonal contribution, a "lumi" label to be 100% correlated across all bins, and all other labeled uncertainties treated as orthogonal). The correlation or covariance matrices can be constructed using Eq. (4.3). This format presents the second-order data in the form of "effective" elementary uncertainties.
• Auxiliary files in arbitrary format: the ad hoc nature of these makes them impossible to be handled by unsupervised algorithms. This includes 2D histograms in ROOT data files, since variations in path structure and the ambiguity between covariance or correlation matrices are an impediment to automated use. This presentation style will be disregarded below.
The table and error source formats may be readily extended for automated data handling and are thus appropriate to release SL data.
In the case of the table format, in addition to the observed central values and statistical uncertainties usually released, extra HepData tables can encode the m 1,I , m 2,IJ , m 3,I moments describing the combined nuisance parameters. However the HepData table headers will have to be augmented in a standardised fashion to express the relationships between tables, i.e. unambiguously identifying the moment data tables associated with a first-order dataset. While the format is conceptually straightforward, introducing the semantic description of the tables is at present highly impractical. We hence recommend the error source format for which identifying the associations between datasets is trivial.
In the error source format, the m 1,I , m 2,IJ , m 3,I moments are all encoded in the form of labeled vectors. The m 2,IJ matrix is reconstructed via a sum of the form where the a I,i are the released error sources. The vector of third order data can be indicated via a special label. There is not limit in the number of labels associated to an element hence this format is very flexible. For instance the a I,i error sources corresponding to the decomposed covariance can just get bland names such as "sys,NP1", but can also be extended with, e.g., a "th" prefix to allow separation of experimental and theory systematics (since the theory can in principle be improved on in future reinterpretations). This format requires some keyword standardisation. The final scheme should be capable of equally applying to any kind of experimental data and systematic uncertainties. In particular it should be valid for event counts, differential cross-sections with bins correlated by the systematic uncertainties, correlations between the bins of different distributions/datasets, and so on.
Summarising, our recommendation is to release the moments of the combined uncertainty distributions via the HepData error source format, which has built-in semantics of arbitrary complexity and can thus make the most of the SL framework. As a showcase example, we provide the pseudo-data used in the next section as a sandbox HepData record at https://www.hepdata.net/record/sandbox/1535641814.

Simplified likelihood in a realistic LHC-like analysis
In this section we introduce a realistic pseudo-analysis that is representative of a search for new physics at the LHC. This analysis will be used to validate the SL method and to test its accuracy in realistic conditions. It is also used to validate the SL reference code presented in Appendix B. Finally, this pseudo-analysis provides a concrete example of SL data release via the HepData table format (see above). The SL and subsequent results of the pseudo-search can be reproduced using these data.
As already mentioned in Section 2, the dominant systematic uncertainties relevant in searches for new physics are those related to the background processes. Imperfect knowledge of detector effects or approximations used in the underlying theoretical models will lead to uncertainties in the predictions of these processes. Any mis-estimation of the background could result in an erroneous conclusion regarding the presence (or absence) of a signal. There are a number of different ways in which an experimentalist may assess the effect of a given systematic uncertainty, but generally, these effects are parameterised using knowledge of how the estimation of a given process which change under variations of some underlying parameter of the simulation model, theory, detector resolution, etc. Estimates of the contribution from background processes are obtained either from simulation or through data-driven methods. In the following section, we describe a pseudo-search for new physics, inspired by those performed at the LHC, in which systematic uncertainties are included, and derive the SL parameters for it.

A LHC-like pseudo-search for new physics
In order to illustrate the construction of the SL, a model has been constructed which is representative of a search for new physics at the LHC. Typically, in these searches the observed events are binned into histograms in which the ratio of signal to background contribution varies with the bin number. A search performed in this way is typically referred to as a 'shape' analysis as the difference in the distribution (or shape) of the signal events, compared to that of the background, provides crucial information to identify a potential signal.
Our pseudo-search requires to make assumptions for an "observed" dataset, for the corresponding background, and for the new physics signal. These ingredients are summarised in Figure 2, which shows the distribution of events, in each of three categories along with the expected contribution from the background and the uncertainties thereon, and from some new physics signal. The 'nominal' background follows a typical exponential distribution where fluctuations are present, representing a scenario in which limited MC simulation (or limited data in some control sample) was used to derive the expected background contribution. The uncertainties due to this, indicated by the blue band, are uncorrelated between the different bins. Additionally, there are two uncertainties which modify the 'shape' of backgrounds, in a correlated way. The effects of these uncertainties are indicated by alternate distributions representing 'up' and 'down' variations of the systematic uncertainty. Finally, there are two uncertainties which effect only the overall expected rate of the backgrounds. These are indicated in each category as uncertainties on the normalisation N of the background. These uncertainties are correlated between the three categories and represent two typical experimental uncertainties; a veto efficiency uncertainty (eff.) and the uncertainty from some data-simulation scale-factor (s.f.) which has been applied to the simulation.  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89

Parameterisation of backgrounds
It is typical in experimental searches of this type to classify systematic uncertainties into three broad categories, namely; those which affect only the normalisation of a given process, those which effect both the 'shape' or 'distribution' of events of that process in addition to its normalisation, and those which affect only a small number of bins or single bin in the distribution and are largely uncorrelated with the other bins (eg uncertainties due to limited MC simulation). The expected (or nominal) 6 number of background events, due to a particular process, in a given bin (I) in Eq. (2.3) is denoted by where the process index (k) is suppressed here as we only have a single background process. The functions N (δ) and f I (δ) are the total number of expected events for that process in a particular category and the fraction of those events expected in bin I, respectively, for a specified value of δ. Often, these functions are not known exactly and some interpolation is performed between known values of n I at certain values of δ. For each uncertainty, j, which affect the fractions, f I , a number of different interpolation schemes exist. One common method, however, is to interpolate between three distribution templates representing three values of δ j . Typically, these are for δ j = 0, the nominal value, and δ j = ±1 representing the plus and minus 1σ variations due to that uncertainty. The interpolation is given by where f 0 I = f I (δ = 0) and F (δ) = I f I (δ) ensures that the fractions sum to 1. In our pseudo-search, as there are three event categories, there are three of these summations, each of which runs over the 30 bins of that category. The polynomial p Ij (δ j ) is chosen to be quadratic between values of −1 ≤ δ j ≤ 1 and linear outside that range such that, The values of κ − Ij and κ + Ij are understood to be determined using the ratios of the template for a −1σ variation to the nominal one and the +1σ variation to the nominal one, respectively 7 .
For uncertainties which directly modify the expected number of events n i of the distributions, an exponent interpolation is used as the parameterisation. This is advantageous since the number of events for this process in any given bin is always greater than 0 for any value of δ j . For a relative uncertainty Ij , the fraction varies as This is most common in the scenario where a limited number of MC simulation events are used to determine the value of n 0 b,I and hence there is an associated uncertainty. As these uncertainties will be uncorrelated between bins of the distributions, most of the terms Ij will be 0.
Systematic uncertainties that affect only the overall normalisation are also interpolated using exponent functions, where N 0 = N (δ = 0) and j runs over the elementary nuisance parameters. A simple extension to this arises if the uncertainty is 'asymmetric', as in our pseudo-search; the value of K j is set to K + j for δ j ≥ 0 and to K − j for δ j < 0. Furthermore, any uncertainty which affects both the shape and the normalisation can be incorporated by including terms such as those in Eq. (5.2) in addition to one of these normalisation terms. In our pseudosearch, there will be a separate N (δ) term for each category which provides the total expected background rate summing over the 30 bins of that category.
Combining Eqs. (5.2), (5.4) and (5.5) yields the full parameterisation, As already mentioned, a typical search for new physics will have contributions from multiple background processes, each with their own associated systematic uncertainties. Only by summing over all of these backgrounds (i.e. n b,I = p n b,p,I for different background processes p) is the likelihood fully specified.

Validation of the simplified likelihood
Here we compare the true and simplified likelihoods arising from the pseudo-search. It is also instructive to consider the simplified likelihood obtained when neglecting the third moments, i.e. when setting the coefficients of the quadratic terms c I to zero in Eq. (2.3). This less accurate version of the SL will be referred to as "symmetric SL", as opposed to the more precise "asymmetric SL" developed in this work.
We constructed 100,000 pseudo-datasets by taking random valuesδ, generated according to π(δ), and evaluating n b,I (δ) for each dataset according to the Eq. (5.6). Figure 3 shows the distribution ofn i , for an example bin, i = 62, from the SL. The values of m 1 , m 2 and m 3 are calculated using the pseudo-datasets and subsequently used to calculate the coefficients for the SL. In Figure 4, 2D projections of the background distributions are shown between four pairs of signal-region bins: bin pair (4, 7) shows a projection for high-statistics bins where both the asymmetric and symmetric SL agree closely with the true distribution (that obtained in the pseudo-datasets); the true distribution in (4, 62) starts to display deviations from the multivariate normal approximation which are well captured by the asymmetric SL. This is expected when the skew, defined as m 3,I /(m 2,II ) 3 2 , is small. However, in the bottom pair of plots with bins 4 and 62 joint with the low-statistics bin 86, the proximity of the mean rate to zero induces a highly asymmetric Poisson distribution which neither SLs can model well. In these last two plots, it can be seen that the asymmetric SL peaks at too low a value, near a sudden cutoff also seen in Figure 3, while the symmetric SL peaks at too high a value. In this region a better modelling would require evaluation of higher-order coefficients (and/or off-diagonal skew terms) and hence higher moments of the experimental distributions.
An advantage of the asymmetric SL is that a strictly positive approximate distribution can be guaranteed, while the symmetric SL can have a significant negative yield fraction as seen in the figures for bin 86. Sampling from the symmetric SL, e.g. for likelihood marginalisation, requires that the background rates be positive since they are propagated through the Poisson distribution. The asymmetric SL provides a controlled solution to this issue, as opposed to ad hoc methods like use of a log-normal distribution or setting negative-rate samples to zero or an infinitesimal value: the symmetric SL has a negative fraction of ∼ 11.6%, while the asymmetric SL has a negative fraction of exactly zero.
Typically in searches for new physics, limits on models for new physics are determined using ratios of the likelihood at different values of the parameters of interest. In the simplest x 86 x 7 x 31 x 62 x 86

2D correlations
True pdf max Symm pdf Asymm pdf where the yields n s,I here refer explicitly to the expected contributions from signal for a specified hypothesis. In order to remove the dependence of the likelihood on the nuisance parameters, θ, the nuisance parameter values are set to those at which the likelihood attains its maximum for a given set of n obs I . This is commonly referred to as 'profiling' over the nuisance parameters 8 .
The test-statistic t µ is then defined using the ratio, where L max S denotes the maximum value of L max S (µ) for any value of µ. 9 Similarly, such likelihood ratios are also used for quantifying some excess in the case of the discovery of new physics [13]. The test-statistic can also be constructed for the experimental likelihood L(µ, δ)π(δ), where the same substitution as in Eq. (5.7) is applied, by profiling the elementary nuisance parameters δ. A direct comparison of the test-statistic for the full and simplified likelihoods, as a function of µ, is therefore possible. Figure 5 shows a comparison of the value of t µ as a function of µ for the pseudo-search between the full (experimental) likelihood and the asymmetric SL. In addition, the result obtained using only the symmetric SL is shown. As expected, the agreement between the full and simplified likelihood is greatly improved when including the quadratic term. A horizontal line is drawn at the value of t µ = 3.86. The agreement in this region is particularly relevant due to the fact that asymptotic approximations for the distributions of t µ [14] allow one to determine the 95% confidence level (CL) upper limit on the signal strength, µ up . The signal hypothesis is 'excluded' at 95% CL if µ up < 1.
When determining the SL coefficients, we have relied on pseudo-datasets, as we expect this will often be the case for anyone providing SL inputs for real analyses. The accuracy of the SL coefficients will necessarily depend on the number of pseudo-datasets used to calculate them. To investigate this, we have performed a study of the rate of convergence of the SL coefficients by calculating them using several different numbers of pseudo-datasets, the largest being 100,000 pseudo-datasets. The coefficients for the three bins calculated using 100,000 pseudo-datasets are; a = 84.9, b = 8.27, c = 0.32 for bin 4, a = 2.61, b = 0.90, c = 0.11 for bin 62, and a = 0.90, b = 0.47, c = 0.13 for bin 86. The calculation of the coefficients is repeated using many independent sets of a fixed number of pseudo-datasets, resulting in a distribution of calculations for each coefficient.
The root mean square (RMS) of the resulting distributions provides an estimate for how much variation can be expected in the calculation of the SL coefficients given a limited pseudo-data sample size. The RMS values are normalised to the RMS of the distributions resulting from a sample size of 100,000 pseudo-datasets to give a relative RMS. The relative RMS of the distribution of the coefficients calculated using increasing numbers of pseudodatasets is shown Figure 6.
The coefficients a and b can be calculated with relatively high precision using only 1000 pseudo-datasets in each case. This is true whether the value of b is large compared to a, as in the case of bin 86, or not, as in the case of bin 4. The determination of the c coefficient for bin 4 however is slower to converge, requiring 5000-10,000 pseudo-datasets to calculate accurately. However, since the value of c for this bin is relatively small compared to b, the coefficient c is less relevant so that a poor accuracy will have little effect on the accuracy of the SL. In bin 86, the value of c is relatively large, compared to b, meaning it will significantly contribute to the SL. In this case, the convergence is quite fast, with only 2,500 pseudo-datasets required to achieve a 10% accuracy in the value of c. We find the property that bins with large c values, compared to b values, require fewer pseudo-datasets to achieve a good accuracy than bins for which the c value is less relevant generally holds in this study.

Conclusions
The transmission of highly complex LHC likelihoods from the experimental collaborations to the scientific community has been a long standing issue. In this paper, we proposed a simplified likelihood framework which can account for non-Gaussianities as a convenient way of presentation with a sound theoretical basis.
Although the SL is accurate, it is still an approximation of the full experimental likelihood, hence the collaborations do not have to release their full model. Meanwhile, for the public, having a good approximation of the true likelihood is sufficient for most phenomenology purposes. Moreover, the SL is very simple to transmit, requiring neither a big effort for the experimentalists to release it nor for the user to construct it. Additionally, with some standardisation effort, part of this transmission process can be automated.
In this paper we introduced the formalism for the asymmetric version of the SL. This formalism follows directly from the central limit behaviour of the combination of systematic uncertainties: asymmetry is recognised as the subleading term of the asymptotic distribution dictated by the CLT, which is then recast in a convenient form in the SL formulation. The inclusion of asymmetry completes the SL and provides a fully reliable framework.
The asymmetric SL can be built either from the elementary systematic uncertainties themselves or from the three first moments of the combination of the systematic uncertainties, which are easily obtained via MC generators. Using a realistic LHC-like pseudo-search for new physics, we demonstrated that including asymmetry in the SL provides an important gain in accuracy, and that it is unlikely that higher moments will be needed.
In practice, for the transmission of the SL data from an experiment to the public, our recommendation is to simply release the three first moments of the combined uncertainties, preferably via the HepData repository in the error source format. The SL framework is flexible in the sense that it can apply to one or more subsets of the systematic uncertainties, and the HepData error source format has adequate flexibility to account for any partitions of the uncertainties the releaser wishes to make.
If systematically adopted by the experimental and theory communities, the SL has the potential to considerably improve both the documentation and the re-interpretation of the LHC results.
A The CLT at next-to-leading order Let us show in a 1D example how the skew appears in the asymptotic distribution. Consider N independent centered nuisance parameters δ j of variance σ 2 and third moment γ. Define The characteristic function of Z is given by where ϕ j (x) = E[e ixδ j ]. In the large N limit, each individual characteristic function has the expansion It follows that the full characteristic function ϕ Z then simplifies to This characteristic function is simple but has no exact inverse Fourier transform.
To go further, let us observe that the Z random variable could in principle be written in terms of a normally distributed variable θ ∼ N (0, σ 2 ), with Z = φ(θ) where φ is a mapping which is in general unknown. At large N however, we know that Z tends to a normal distribution hence φ tends to the identity. Thus we can write Z = √ N φ θ √ N and Taylor expand for large N , Let us now compare the characteristic function of this expansion to Eq. (A.4). We find that the characteristic function is given by after using the large N expansion. This function matches Eq. (A.4) for c = γ 3 . Thus we have found the normal expansion provides a way to encode skewness in the large N limit. Namely, we find that the Z variable converges following When the quadratic term becomes negligible the distribution becomes symmetric, and we recover the usual CLT. We can see that for finite N (as opposed to N → ∞) the support of Z is not R. For example for γ > 0, we have Z > −3 √ N /4γ.
It includes functions to calculate the SL a I , b I , c I , and ρ IJ coefficients from provided moments m 1,I , m 2,IJ and m 3,I ; and an SLParams class which computes these and higherlevel statistics such as profile likelihoods, log likelihood-ratios, and related limit-setting measures computed using observed and expected signal yields. For convergence efficiency, the profile likelihood computation makes use of the gradients of the SL log-likelihood with respect to the signal strength µ and nuisance parameters θ, which we reproduce here to assist independent implementations: ln L S (µ, θ)π(θ) = where n b,I (θ) = a I + b I θ I + c I θ 2 I . The reference code has been written with reverse engineering and comprehensibility of the calculations explicitly in mind. While it computes likelihood statistics on a reasonable timescale, further (but less readable) optimisations can be added for production code.
A demo of the construction of the simplified likelihood, and profiling as a function of a signal strength parameter, is given in simplikedemo.py. Finally, the SL pseudo-data are available on the HepData repository at https://www.hepdata.net/record/sandbox/ 1535641814.