A binned likelihood for stochastic models

Metrics of model goodness-of-fit, model comparison, and model parameter estimation are the main categories of statistical problems in science. Bayesian and frequentist methods that address these questions often rely on a likelihood function, which is the key ingredient in order to assess the plausibility of model parameters given observed data. In some complex systems or experimental setups, predicting the outcome of a model cannot be done analytically, and Monte Carlo techniques are used. In this paper, we present a new analytic likelihood that takes into account Monte Carlo uncertainties, appropriate for use in the large and small sample size limits. Our formulation performs better than semi-analytic methods, prevents strong claims on biased statements, and provides improved coverage properties compared to available methods.


Introduction
The use of Monte Carlo (MC) techniques to calculate nontrivial theoretical quantities and expectations in complex experimental settings is common practice in particle physics. A MC event is a single representation of what can be detected in data and is typically generated from a single realization of the underlying physics parameters, θ g . These events are often binned in some observable space and compared with the data. Since the generation process is stochastic, a particular θ g used for generating the MC can lead to different outputs. This stochasticity introduces an uncertainty in the MC distributions. Furthermore, as production of large MC is often time-consuming, reweighting is used to move from one hypothesis to another. In reweighting, each MC event is assigned a new weight, w( θ), that accounts for the difference between the generation parameters θ g and the hypothesis parameters θ [1]. It follows that MC uncertainties will be hypothesis dependent; thus, to do hypothesis testing, it is important to account for them. This is especially important JHEP06(2019)030 for small-signal searches, performed in the small sample limit, where a modified-χ 2 may not be suitable [2]. A Poisson likelihood is a more appropriate statistical description of event counts [3], but in that case a proper treatment of MC statistical uncertainties is less straightforward. Solutions to this problem have been discussed in the literature in the context of frequentist statistics by adding nuisance parameters [4][5][6], as well as detailed probabilistic treatment of MC weights [7]. However, [4,6,7] add additional time complexity, and [5] does not provide a full exposition on how to incorporate weighted MC. We present a new treatment that is valid in the large and small limit of the data sample size, suited for frequentist and Bayesian analyses, based on the Poisson likelihood. Our likelihood accounts for statistical uncertainties due to MC, allows for arbitrary event-byevent reweighting, and is computationally efficient. A test statistic based on the proposed likelihood is found to follow a distribution closer to the asymptotic form expected from Wilks' theorem than other likelihoods in the literature. An implementation of the likelihood described in this work can be found in [8].
This paper is organized as follows. In section 2 we briefly review two common treatments available in the literature to account for MC statistical uncertainty. In section 3 we define and discuss our new likelihood. In section 4 we study the performance of the likelihood through an example and compare it to other likelihoods in the literature. In section 5 we provide our conclusions. A summary of the likelihoods discussed in the paper, including our main result, is given in appendix A.

The Poisson likelihood and previous work
In order to compare MC with data, events are often binned into distributions across a set of observables. For simplicity we focus on a single bin. In the absence of cross-bin-correlated systematic uncertainties the generalization to multiple bins is simply a product over the likelihood in all bins. This is assumed for the remainder of the paper. It is well known that the count of independent, rare natural processes can be described by the Poisson likelihood, given by where λ( θ) is the expected bin count for a hypothesis and k is the number of observed data events. Equation (2.1) requires exact knowledge of the expected bin count, λ( θ). In the case of complex experiments it is often not possible to obtain λ( θ) exactly and MC techniques are used to estimate the expected distributions. For weighted MC, often a direct substitution of λ( θ) by i w i ( θ) is used, where w i are the weights of each of the MC events in the bin. Then eq. (2.1) can be approximated as This ad hoc treatment assumes that the MC estimate of the expected bin counts exactly matches the true expectation rate of the model, neglecting the stochastic nature of MC. In the case of large MC, eq. (2.2) converges to eq. (2.1) for the hypothesis given by θ.

The Barlow-Beeston likelihood
To treat MC statistical uncertainties in the small sample limit, a modification of the Poisson likelihood was introduced in [4], which is briefly covered below. First, note that the expectation in a single bin is given by contributions from different physical processes, which we index by j. Then, the number of expected events can be written as wheren j is the expected number of MC events from process j that fall in the bin and s is the total number of relevant processes. Substituting eq. (2.3) into eq. (2.1) gives the Poisson likelihood for observing k data events. For stochastic models,n j is unknown. Instead, the MC outcome can be modeled as having drawn n j events from a random process that simulates the physical process. We can approximate n j as being drawn from a Poisson process with meann j . 1 Profiling on the true number of MC events per process in the bin results in the Barlow-Beeston (BB) likelihood, given by [4] L BB ( θ|k) = max where λ( θ) is given by eq. (2.3), n j andn j are the estimated and true MC counts in the bin respectively, and {n j } s j=1 denotes the s nuisance parameters we have profiled over. In the above formalism we have produced the MC at the natural rate, but this is not the case for weighted MC. The prescription for weighted MC is given by replacing eq. (2.3) with where η j ( θ) is a scale factor for process j that accounts for the differences in the MC generation and the target hypothesis of interest. In this case, the likelihood definition is still given by eq. (2.4); an explicit formula for s = 1 is given in the appendix. However, for arbitrary weight distributions per physical process L BB may not be appropriate as it neglects the variance from a sum of weights [4]. It remains valid only in the case where the distribution of weights for each process is narrow.

Uncertainties in the large-sample limit
In the large-sample regime, the Gaussian distribution is an appropriate description of the observed data. In this limit, the use of Pearson's χ 2 as a test-statistic [9] is common practice. For a single analysis bin, Pearson's χ 2 is defined as The MC generation is a binomial process where we generate a fixed number of events for each process, Nj, and accept them into the bin of interest with probability βj( θ), such thatnj( θ) = βj( θ)Nj. In the limit of both of rare processes (βj 1) and large number of generated events (Nj 1), the total number of observed events can be approximated as Poisson distributed with mean λ( θ) = j βj( θ)Nj = jn j ( θ).

JHEP06(2019)030
where we continue to use the approximation λ( θ) = i w i ( θ) and w i are the weights of each of the MC events. The form of Pearson's χ 2 arises from the fact that the Gaussian distribution of k is the large-sample limit of a Poisson distribution for which the expected statistical variance of the observation is given by λ( θ). Systematic uncertainties, under the assumption that they follow a Gaussian distribution and are independent between bins, can be included as .
However, this method of incorporating systematic uncertainties tends to overestimate them in shape-only analyses; see [10] for a recent discussion in the context of reactor neutrino anomalies. Similarly, one can include uncertainties to account for statistical fluctuations of the MC in the test-statistic. In doing so, the Gaussian behavior is implicit and the modified χ 2 reads where σ 2 mc is the MC statistical uncertainty in the bin given by Note that this test-statistic definition is not appropriate in the small-sample regime, as the data is no longer well described by a Gaussian distribution. If one uses a χ 2 test-statistic in the small-sample regime, one ought to calculate the test-statistic distribution properly to achieve appropriate coverage [11].

Generalization of the Poisson likelihood
Ideally we would like to obtain the expected event count for any hypothesis, λ( θ), however we are considering problems where this relationship is not known and λ is instead estimated by MC. The key difference here is that instead of using exact knowledge of λ we want to perform Bayesian inference to obtain P(λ| θ) using the MC available. Assuming the weights are functions of θ, we have where the distribution of λ, P λ| w( θ) , is inferred from the MC. The likelihood, L AdHoc , in eq. (2.2) is recovered when P(λ| w( θ)) = δ λ − i w i ( θ) , but clearly this is an unrealistic assumption as it presumes perfect knowledge of the parameter λ( θ) from a finite number of realizations. Instead, it is more appropriate to construct P(λ| w( θ)) based on the MC realization. This is given by

JHEP06(2019)030
where P(λ) is a prior on λ that must be chosen appropriately and L(λ| w( θ)) is the likelihood of λ given w( θ). This is similar to [4,5], but instead of fitting λ as a nuisance parameter as in L BB in eq. (2.4), we marginalize over it in eq. (3.1) as informed by the MC weights. When L General is used under a frequentist approach, the marginalization over λ implies a hybrid Bayesian-frequentist construction, similar to the treatment of nuisance parameters described in [12] and employed in [13,14]. This section is organized as follows. We first derive L(λ| w( θ)) assuming identical weights in section 3.1, then extend it to arbitrary weights in section 3.2. With this in hand, we calculate an analytic expression for eq. (3.1) using eq. (3.2) under a uniform P(λ) in section 3.3. In section 3.4 we briefly discuss a family of distributions as possible alternative priors. In section 3.5 we show that our effective likelihood converges to eq. (2.2) in the limit of large MC size. Finally, in section 3.6 we provide some intuition on the behavior of our generalized likelihood. Equation (3.16), along with the definitions of µ and σ 2 given in eq. (3.3), constitute the primary result of this work.

Derivation of L(λ| w( θ)) for identical weights
In this section we derive L(λ| w( θ)) for identical weights. We will show that L(λ| w( θ)) can be written in terms of two quantities for a bin with m MC events.
For identical weights, w ≡ w i ∀i, the following equalities hold: Assume that m is the outcome of sampling a Poisson-distributed random variable M with probability mass function wherem is the mean of the distribution. Further, assume that the expected number of data events λ = wm so thatm = λ/w. Substituting back into eq. (3.5), we can interpret Poisson(M = m;m) as a likelihood function of λ as µ and σ fully specify w( θ) for identical weights.

Extension to arbitrary weights
The derivation above assumed identical weights. For arbitrary weights, µ is an outcome sampled from a compound Poisson distribution (CPD), which can be approximated by a

JHEP06(2019)030
scaled Poisson distribution (SPD) by matching the first and second moments of the two distributions [15]. In order to make the connection, first rewrite µ and σ 2 as where m Eff is the effective number of MC events and w Eff the effective weight. From eq. (3.4) these are given by: m Eff = µ 2 /σ 2 and w Eff = σ 2 /µ. Next, assumem = λ/w Eff and where λ again is the expected number of events in data. Equation (3.8) can be written as a likelihood function of λ, which is identical to eq. (3.6) except the denominator is now a gamma function instead of a factorial. However, since the denominator does not depend on λ it cancels out in eq. (3.2).
To understand this approximation, note that the maximum likelihood in eq. (3.8) occurs whenm = m Eff . The first and second moments of the SPD random variable w Eff M , where M ∼ Poisson(m Eff ), are given by This shows that the SPD, under the maximum likelihood solution for the given MC realization, has first and second moments that match the sample mean, µ, and variance, σ 2 , respectively. These are equal to the first and second moments of the CPD as described in [15]. By assuming that µ is drawn from a SPD, we can treat µ and σ as outcomes that fix the likelihood function of the underlying scaled expectation λ, analogous to the case of identical weights. Because both the first and second moments are matched, this approximation accounts for the variance of the CPD unlike L BB , which only accounts for the mean. Thus, while L BB is valid only for the case of narrow weight distributions, our approximation remains valid for broader distributions.

JHEP06(2019)030
Then, assuming a uniform P(λ) and substituting eq. (3.9) for L(λ| w( θ)) in eq. (3.2) we obtain where Γ is the gamma function and G the gamma distribution with shape parameter α and inverse-scale parameter β. Note that in going from eq. (3.9) to eq. (3.13) µ and σ 2 go from random variates for a particular λ to parameters that govern the probability density of λ.
With this choice of L(λ| w( θ)) and P(λ), we can rewrite L General from eq. (3.1) as where µ and σ 2 depend on θ through w.

A family of likelihoods
It is possible to generalize the choice of α and β in eq. (3.12) by choosing a particular form of P(λ). Since the distribution of interest is a Poisson distribution, a well-motivated choice of P(λ) is a gamma distribution (the conjugate prior of the Poisson distribution) [16]; also see [7] for a recent discussion. Thus we set P(λ) = G(λ; a, b), where a and b are the shape and inverse-scale parameters of the gamma distribution, respectively. These hyper-parameters dictate the distribution of the Poisson parameter λ [17]. In line with our previous discussion, the gamma distribution prior implies that eq. (3.12) becomes The rest of the likelihood derivation remains the same. This allows the choice of specific values for a and b to satisfy certain properties. Equation (3.12) is obtained with a = 1 and b = 0, corresponding to the uniform prior discussed above. Another interesting choice is to require that the mean and variance of P(λ|µ, σ) match µ and σ 2 , respectively. This can be achieved by setting a = b = 0, and we refer to this parameter assignment as L Mean . In the case of identical weights, L Mean is equivalent to eq. (20) in [7]. Both choices are improper priors, as technically they are limiting cases of the gamma distribution. However, we can use them to obtain proper P(λ|µ, σ) distributions.

JHEP06(2019)030
In [7], a convolutional approach is suggested for handling arbitrary weights. We refer to this likelihood as L G . Each weighted MC event has P(λ i |w i ) = G(λ i ; 1, 1/w i ), corresponding to the prior P(λ i ) = G(λ i ; 0, 0), such that λ = m i λ i . The likelihood L Mean is a good analytic approximation of the more computationally expensive calculation given in [7] for L G . The latter has time complexity O(k 2 m) where k and m are the number of data and MC events in the bin respectively. When assuming uniform priors, the convolutional approach does not recover eq. (3.13) for identical weights, so it cannot be used as a generalization of L Eff .

Convergence of the effective likelihood
In this section we will show that, if the relative uncertainty of the bin content vanishes as MC size increases, L Eff and L Mean both converge to L AdHoc .
For positive weights w i , the relative uncertainty σ/µ is bounded between zero and one. Uncertainty as large as the estimated quantity, σ/µ = 1, occurs if and only if m = 1. In the limit that σ/µ goes to zero, eq. (3.13) converges to δ(λ − µ) and L Eff and L Mean both go to L AdHoc . We can see this by noting that the shape parameter, α, goes to infinity as the MC relative uncertainty goes to zero, turning the gamma distribution into a Gaussian distribution of mean α/β and variance α/β 2 . This Gaussian converges to δ(λ − µ) in the limit of vanishing σ/µ. Substituting into eq. (3.1), we recover eq. (2.2), which converges to eq. (2.1) in the large MC limit.
It remains to be shown that the relative uncertainty of the bin content vanishes as MC size increases. For identical weights, For arbitrary weights, the limit can be written in terms of the running average of w i and w 2 i as where w m is the average over w i and w 2 m the average over w 2 i for i ≤ m. This shows that as long as w 2 m does not grow much faster than w 2 m , the limit will converge to zero. For weight distributions with positive support and finite, non-zero mean, this should be the case.

Behavior of the effective likelihood
It is instructive to examine the behavior of L Eff for a single bin. It is standard to work with the log-likelihood l(µ, σ|k) ≡ −2 ln L(µ, σ|k) and we do so here. Figure 1 shows the contour lines for l Eff (µ, σ|k = 100). Since µ and σ are both dependent on the same underlying parameters, θ, a minimization over θ can be thought of as a constrained minimization over µ and σ. This is visualized as the gray region in figure 1, which indicates where µ and σ are allowed to vary for some physics model. 2 Similarly, we can also visualize the JHEP06(2019)030  To further illustrate the effect of the accessible region, we minimize l Eff over µ for two possible constraints: fixed σ and fixed σ/µ. In terms of eq. (3.3), a sufficient but not necessary condition for constant σ/µ with varying µ is equal weights, and a necessary but JHEP06(2019)030 not sufficient condition for constant σ with varying µ is m ≥ 2. For a standard Poisson likelihood,μ Poisson ≡ min µ l Poisson (µ|k) = k. Figure 2 showsμ Eff ≡ min µ l Eff (µ, σ|k = 100) as well as the region where l Eff (µ, σ|k) − l Eff (μ, σ|k) < 1 for fixed σ (left) and fixed σ/µ (right). Note that the shaded regions for fixed σ are calculated without requiring that µ ≥ σ, which would be the case for eq. (3.3). As σ goes to zero, the Poisson best-fit and Wilks' 1σ interval are recovered. As σ or σ/µ increases, the shaded region becomes wider, as expected. For fixed σ,μ Eff does not deviate much fromμ Poisson , while for fixed σ/µ,μ Eff deviates fromμ Poisson as σ/µ increases. The shaded regions correspond to the 1σ interval assuming the approximation from Wilks' theorem and give a sense of the shape of L Eff projected onto one-dimensional slices.

Example and performance
In practice, likelihoods such as those discussed above are used to estimate physical parameters from data. As discussed in section 1, weighted MC is often used to compute the likelihood of a particular physical scenario given the observed data. Statements are then made about the physical scenarios either by maximizing the likelihood or by examining the posterior distribution assuming some priors. We examine a toy experiment where we measure the mode, Ω, and normalization, Φ, of a Gaussian-distributed signal against a steeply falling inverse power-law background. The performance of L Eff is evaluated and compared against other likelihoods.
For our toy experiment, we generate the true energies, E t , of synthetic data events from a background falling as (E t /100 GeV) −γ b t , where γ b t = 3.07, and a Gaussian signal centered at Ω t = 125 GeV with width of σ t = 2 GeV and normalization Φ t = 5013 for a fixed number of expected events. Our imaginary detector is sensitive in the 100-160 GeV range. To simulate the effect of a real detector, the true energy, E t , is smeared by 5% for background and 3% for signal to obtain event-by-event reconstructed energies, E r . We generate a total number of MC events, N MC , split evenly between the components. Generation is performed assuming inverse power-law distributions of (E t /100GeV) −γg for signal and (E t /100 GeV) −γ b g for background. We choose γ g = 1 and γ b g = 2. Reweighting of the MC can then be performed as a function of E t and forward-folded onto distributions in E r over which the events are histogrammed and likelihoods evaluated. A diagram of the steps described above is shown in figure 3. For all toy experiments, the background component, (Φ b , γ b ), and the signal width, σ, are kept fixed to their true values. Only the signal mean, Ω, and normalization, Φ, are treated as free parameters. Figure 4 shows the expectation in E t as well as the data and L Eff best-fit distributions in E r . The leftmost panel shows the expectation for both signal and background assuming no smearing in E t . The three other panels show the smeared, E r , distribution for data (black) and the best-fit result from L Eff for three different MC datasets (orange) of varying MC size. The smeared shape of the signal peak is clearly visible in data, but not in the smallest size MC. As the MC increases in size, the best-fit MC can be seen to converge to data.

Detector simulation
Smearing to mimic detector response.

Reweight MC to a physical hypothesis
Histogram MC Binned MC expectation.

Synthetic data generation
Produce data according to

Detector simulation
Smearing to mimic detector response.
Histogram Data Binned synthetic data.
L( θ)  The best-fit values for the example shown in figure 4 are given in table 1 for L Eff and L AdHoc . As point estimators, both likelihoods return similar values. This is driven by the fact that the same underlying MC distribution is used to fit to the data. The effect of convoluting P(λ| w( θ)) with the Poisson distribution mostly serves to broaden the likelihood space, while preserving the maximum within the constraints described in section 3.6. In the large MC limit, both likelihoods can be used for unbiased point estimation, provided that the likelihood space is smooth enough for standard minimization techniques to probe the global minimum.

Coverage
Due to the higher computational cost of computing frequentist confidence intervals by generating pseudodata to estimate the test-statistic (T S) distribution, it is common to use the approximation given by Wilks' theorem for the cases where the underlying hypotheses hold. In the case of small MC, a likelihood description that neglects MC uncertainties may lead to undercoverage even for a large data sample. In this section, we will use T S = ∆l = l( θ true ) − l(ˆ θ), where θ true andˆ θ correspond to the true and best-fit (Ω, Φ), respectively. We evaluate the coverage properties, computed using the asymptotic approximation given by Wilks' theorem, of the two-dimensional fit over (Ω, Φ) for several likelihood constructions. These include the modified-χ 2 , L AdHoc , L BB , L Mean , and L Eff . These five test-statistics were chosen on the basis of their computation speed and as tests of different approaches towards the treatment of weighted MC.
Several configurations were tested, all under the assumptions of the toy experiment described in section 4.1. The MC was generated for two different settings of the total JHEP06(2019)030 number of events: 10 3 and 10 6 . For each setting, 500 toy experiments were generated, their best-fits found, and their ∆l evaluated. Each toy experiment was classified as covering θ true at a specified level p if ∆l < I(p; 2), where I is the inverse of the χ 2 cumulative density function and 2 indicates the number of degrees of freedom. Figure 5 shows the percentage of times the true parameters were within the confidence intervals at level p as a function of the estimated coverage percentile for that level. First note that, as expected, the true coverage is highly dependent on MC size, with higher MC size leading towards improved agreement. In the case of N MC = 10 3 , L BB , L Mean , modified-χ 2 , and L AdHoc all undercover to varying degrees of severeness. For N MC = 10 6 , L AdHoc still undercovers, which is not surprising as it presumes zero MC uncertainty, but the other likelihoods exhibit good agreement. In this benchmark test, L Eff exhibits the best coverage properties. However, note that using Wilks' theorem in order to evaluate confidence intervals implies an asymptotic approximation. In general, this approximation does not necessarily have to hold and we encourage the reader to always perform their own coverage tests suitable for their particular experimental setup.

Posterior distributions
It is also possible to use L Eff in a Bayesian approach. Using Bayes' theorem, the posterior where π( θ) is a prior on the parameters. As evaluation of the normalization factor can by challenging, P( θ|k) can be approximated using a Markov Chain Monte Carlo (MCMC). For our toy example, we used emcee [18] to sample P( θ|k) under a uniform box prior for two different likelihood functions: L Eff and L AdHoc . The sampling was performed using the data and MC sets described in section 4.1. Figure 6 shows the posterior distributions of Ω and Φ. For each comparison, L Eff (blue) and L AdHoc (orange) were sampled using the same underlying data and MC. We used 20 walkers with 300 burn-in steps followed by 1000 steps as settings for emcee. The left and center column show the marginal posterior distribution for the mass, Ω, and normalization, Φ, respectively. The true value is indicated by the dashed, vertical line. The rightmost column shows the joint posterior distribution with 68% (solid) and 95% (dashed) contours. The true values are indicated by the star. With L AdHoc , the true value of the parameter is highly improbable for the lower MC-size cases of the top and middle rows. In contrast, the posterior evaluated using L Eff has increased width due to the reduced MC size. Even for N MC = 10 6 (bottom row), the shape of the posterior evaluated using L AdHoc is narrower than that using L Eff . Credible regions estimated using L AdHoc would bias the result.

Performance
In this section we compare our performance with other treatments available in the literature in terms of the runtime cost per likelihood evaluation for a single bin. We perform our tests using a single Intel R Core TM i5-8350U CPU @ 1.70GHz running code compiled with clang version 6.0.0-1ubuntu2. We compute the likelihood CPU-evaluation time for JHEP06(2019)030 the following likelihoods: L AdHoc , modified-χ 2 , L G [7], L BB [4], and L Eff . For each of them we consider increasing number of MC events from 10 2 to 10 6 , increasing number of background components from 1 to 10 3 , and increasing counts of data events from 10 1 to 10 4 . Figure 7 shows the behavior of the runtime with respect to these quantities. All low MC sample sizes the modified-χ 2 is faster than L Eff since L Eff requires the evaluation of more expensive special functions, however at larger MC sample sizes this additional cost is negligible compared to that of summing the MC weights. In the middle panel of figure 7 it can be seen that all likelihoods except L BB are constant with respect to the number of background components as they only depend on summary statistics of the weight distribution. The Barlow-Beeston likelihood, L BB , incurs an O(bd log d) cost for solving a single root finding problem per physical component, where b is the number of background components and d is the number of digits of precision, and therefore is not constant in runtime with respect to the number of components. However, one key difference between L BB and L Eff is that L Eff must compute two summations (the sum of the weights and sum of the square weights), while L BB needs only to compute a single summation of the MC weights. The rightmost panel of figure 7 shows the runtime as a function of the number of data events; for most likelihoods the number of data events, k, enters only in the evaluation of some special functions which for all practical applications are approximately constant in runtime. L G evaluates a special function which for these purposes can only be computed in O(k 2 m) time, resulting in the dependence on the number of data events. The L AdHoc treatment is always the fastest, but it does not incorporate MC statistical uncertainties in any way.

Conclusion
The use of MC to estimate expected outcomes of physical processes is nowadays standard practice. By construction, MC distributions are sample observations and subject to statistical fluctuations. MC events are also typically weighted to a particular physics model, and these weights may not be uniform across all events in an observable bin. A direct comparison of MC distributions to data is typically performed using L AdHoc or χ 2 , where the expectation from MC is computed as a sum over weights in a particular observable JHEP06(2019)030 bin. Such likelihoods neglect the intrinsic MC fluctuations and may lead to vastly underestimated parameter uncertainties in the case of small MC size. A better approach is to use a likelihood that accounts for MC statistical uncertainties.
Along with the definitions of µ and σ 2 in eq. (3.3), the main result of this work is given in eq. (3.16). This new L Eff is motivated by treating the MC realization as an observation of a Poisson random variate, computing the likelihood of the expectation using the MC and marginalizing the Poisson probability of observed data over all possible expectations. It is an analytic extension of the Poisson likelihood that accounts for MC statistical uncertainty under a uniform prior, P(λ). By assuming that the number of MC events per bin is the outcome of sampling a Poisson-distributed random variable, and that the SPD is a good approximation of the CPD for arbitrary weights, L λ| w( θ) can be written in terms of µ and σ 2 as shown in eq. (3.9). This allows us to calculate L Eff , given in eq. (3.16), which can be directly substituted in favor of L AdHoc . Our construction is computationally efficient, exhibits proper limiting behavior, and has excellent coverage properties. In our tests, it outperforms other treatments of MC statistical uncertainty.  Table 2. Table of likelihood formulas. The likelihood functions discussed in this paper are given in each row. They are written in terms of µ and σ, whose explicit formulas are given in the top row, and the number of observed events, k, in the bin. In the case of L BB we write the likelihood for the single-process case. Our main result and recommended likelihood, L Eff , is given in the last row.