1 Introduction

Consider the problem of assessing the efficacy of a planned experiment that will measure event counts that could be ascribed either to a new physics signal or a standard physics background. The criteria for discovery or exclusion of the signal can be quantified in terms of the p-value. In general, for a given experimental result, p is the probability of obtaining a result of equal or greater incompatibility with a null hypothesis \(H_0\). In high-energy physics searches, for example, the one-sided p-value results are usually reported in terms of the significance

$$\begin{aligned} Z= & {} \sqrt{2}\, \mathrm{erfc}^{-1}(2p), \end{aligned}$$
(1)

and the criteria for discovery and exclusion have often been taken, somewhat arbitrarily, as \(Z>5\) (\(p< 2.867\times 10^{-7}\)) and \(p<0.05\) \((Z>1.645)\), respectively.

Here, we suppose for simplicity that both signal and background are governed by independent Poisson statistics with means s and b respectively, where s is known and b may be subject to some uncertainty. For assessing the prospects for discovery, one simulates many equivalent pseudo-experiments with data generated under the assumption \(H_{\mathrm{data}} = H_{s+b}\) that both signal and background are present, obtaining observed events \(n_1,n_2,n_3,\ldots \). One then calculates the p-value for each of those simulated experiments (\(p_1,p_2,p_3,\ldots \)) with respect to the null hypothesis \(H_0 = H_{b}\) that only background is present. For exclusion, the roles of the two hypotheses are reversed; the pseudo-experiment data is generated under the assumption \(H_{\mathrm{data}} = H_b\) that only background is present, and the null hypothesis \(H_{0} = H_{s+b}\) is that both signal and background are present, so that a different set of p-values is obtained. The challenge is to synthesize the results in the limit of a very large number of pseudo-experiments into a significance estimate \(Z_{\mathrm{disc}}\) or \(Z_{\mathrm{excl}}\). There is no agreement on this step, which is the primary focus of this paper.

A common measure [1] of the power of an experiment is the median expected significance \(Z^{\mathrm{med}}\) for discovery or exclusion of some important signal (i.e., the median of \(Z(p_1),Z(p_2),Z(p_3),\ldots \) for the simulated p-values). A reason to use the median (rather than mean) is that Eq. (1) is non-linear, so that the mean of a set of Z-values is not the same as the Z-value of the corresponding mean of p-values.

Fig. 1
figure 1

Expected significances for discovery (left) and exclusion (right), for signal means \(s = 3\), 6, and 12, as functions of the background mean b. Shown are \(Z^{\mathrm{med}}\), \(Z^{\mathrm{mean}}\), \(Z^{\mathrm{A}}\), and the approximations \(Z^{\mathrm{CCGV}}\) and \(Z^{\mathrm{KM}}\) from Refs. [2, 3] and [4]. The median expected significances show a sawtooth behavior, rather than decreasing monotonically with b

However, \(Z^{\mathrm{med}}\) has a counter-intuitive flaw, which is most prominent when s and b are not too large, and especially for exclusion. As we show in the following examples, for a given fixed s, \(Z^{\mathrm{med}}\) can actually significantly increase as b increases. Similarly, for a given fixed b, \(Z^{\mathrm{med}}\) can decrease as s is increased. This leads to the paradoxical situation that an experiment could be judged worse, according to the \(Z^{\mathrm{med}}\) criteria, if it acquires more data, or if it reduces its background. In this paper, we discuss this problem, and consider some alternatives to \(Z^{\mathrm{med}}\).

2 Known background case

The Poisson probability of observing n events, given a mean \(\mu \), is

$$\begin{aligned} P(n|\mu )= & {} e^{-\mu } \mu ^n/n! . \end{aligned}$$
(2)

Consider first the idealized case that the signal and background Poisson means s and b are both known exactly. One can then generate pseudo-experiment results for n, using \(\mu = s+b\) for the discovery case, and \(\mu = b\) for the exclusion case. A large number of simulated pseudo-experiments can be generated randomly via Monte Carlo simulation methods, as described in the Introduction. However, for all cases in this paper, it is equivalent but much more efficient and accurate to consider exactly once each result n that can contribute non-negligibly, and then weight the results according to the probability of occurrence.

The p-value for discovery, if n events are observed, is

$$\begin{aligned} p_{\mathrm{disc}}(n,b) = \sum _{k=n}^\infty P(k|b) = \gamma (n,b)/\varGamma (n) , \end{aligned}$$
(3)

while that for exclusion is

$$\begin{aligned} p_{\mathrm{excl}}(n,b,s) = \sum _{k=0}^n P(k|s+b) = \frac{\varGamma (n+1, s+b)}{\varGamma (n+1)} , \end{aligned}$$
(4)

where \(\varGamma (x)\), \(\gamma (x,y)\), and \(\varGamma (x,y)\) are the ordinary, lower incomplete, and upper incomplete gamma functions, respectively. The median p-value among the pseudo-experiments can now be converted, using Eq. (1), to obtain \(Z^{\mathrm{med}}_{\mathrm{disc}}(s,b)\) and \(Z^{\mathrm{med}}_{\mathrm{excl}}(s,b)\).

Fig. 2
figure 2

Expected significances for discovery for an extremely small background mean \(b = 10^{-6}\) (left), and exclusion for the strict limit \(b=0\) (right), as functions of the signal mean s. Shown are \(Z^{\mathrm{med}}\), \(Z^{\mathrm{mean}}\), \(Z^{\mathrm{A}}\), and the approximations \(Z^{\mathrm{CCGV}}\) and \(Z^{\mathrm{KM}}\) from Refs. [2, 3] and [4]

Some typical results for \(Z_{\mathrm{disc}}^{\mathrm{med}}\) and \(Z_{\mathrm{excl}}^{\mathrm{med}}\) as a function of b are shown in Fig. 1. They each have a “sawtooth” shape, rather than monotonic as one might perhaps expect. This illustrates the unfortunate feature mentioned in the Introduction that the median expected Z can increase with increasing b. As noted in [2, 3] for \(Z_{\mathrm{disc}}^{\mathrm{med}}\), the underlying reason is that the allowed values of n are discrete (integers), causing the median to remain at a fixed value instead of varying continuously in response to changes in s or b. We emphasize that this sawtooth behavior is exactly reproducible for any sufficiently large number of pseudo-experiments, and has nothing to do with randomness from insufficient sampling. It is more prominent for exclusion than for discovery, because the number of events relevant for the median pseudo-experiment is smaller. Also, note that for larger b, the sawteeth get closer together as the integer n of the median gets larger, but the height of the sawtooth envelope remains significant. This is effectively a sort of practical randomness in \(Z^{\mathrm{med}}\), as tiny changes in s or b will move one between the top and the bottom of the sawtooth envelope.

We now consider several alternatives to \(Z^{\mathrm{med}}\). First, one can take the arithmetic mean of the Z-values directly, which we call \(Z^{\mathrm{mean}}\). (In computing \(Z^{\mathrm{mean}}_{\mathrm{disc}}\), we use \(Z=0\) for no observed events, \(n=0\). A reasonable alternative definition for both \(Z^{\mathrm{mean}}_{\mathrm{disc}}\) and \(Z^{\mathrm{mean}}_{\mathrm{excl}}\) would be to use \(Z=0\) for all outcomes n that give a negative Z. That would give slightly larger values for \(Z^{\mathrm{mean}}\), but usually negligibly so except when \(Z^{\mathrm{mean}}\) is uninterestingly small anyway.) Second, one can take the arithmetic mean of the p-values, and then convert these to Z values, which we call \(Z^{p\>\mathrm mean}\). Third, one can consider the Z-value obtained for the mean n (i.e., average over the simulated \(n_1,n_2,n_3,\ldots \)); the use of the mean data for computing the expected significance has been used in [5, 6] and [2, 3] and was called the Asimov data in the latter three references. References [2, 3] obtained an Asimov approximation to \(Z^{\mathrm{med}}_{\mathrm{disc}}\):

$$\begin{aligned} Z_{\mathrm{disc}}^{\mathrm{CCGV}} = \sqrt{2[(s+b)\ln (1+s/b) -s]}, \end{aligned}$$
(5)

and Ref. [4] gave a similar result for exclusion:

$$\begin{aligned} Z_{\mathrm{excl}}^{\mathrm{KM}} = \sqrt{2[s - b\ln (1+s/b)]} . \end{aligned}$$
(6)

These are both based on a likelihood ratio method approximation (valid in the limit of a large event sample) for Z given in [7] in the context of \(\gamma \)-ray astronomy. They both approach the familiar but cruder approximation \(s/\sqrt{b}\), but only in the limit of very large b.

In this paper, we propose instead to simply use for the Asimov approximation the exact p-values in Eqs. (3) and (4) with n replaced by its expected means:

$$\begin{aligned} \langle n_{\mathrm{disc}} \rangle = s+b, \qquad \langle n_{\mathrm{excl}} \rangle = b, \end{aligned}$$
(7)

so that

$$\begin{aligned} p^{\mathrm{Asimov}}_{\mathrm{disc}}= & {} \gamma (s+b,b)/\varGamma (s+b), \end{aligned}$$
(8)
$$\begin{aligned} p^{\mathrm{Asimov}}_{\mathrm{excl}}= & {} \varGamma (b+1,s+b)/\varGamma (b+1), \end{aligned}$$
(9)

which can be readily converted to Z-values using Eq. (1). We call this the “exact Asimov significance” and denote it by \(Z^{\mathrm{A}}\).

Along with \(Z^{\mathrm{med}}\), Fig. 1 also shows \(Z^{\mathrm{mean}}\) and \(Z^{\mathrm{A}}\) for the discovery and exclusion cases, together with \(Z^{\mathrm{CCGV}}_{\mathrm{disc}}\), and \(Z^{\mathrm{KM}}_{\mathrm{excl}}\), as a function of b, for fixed \(s=3,6,12\). Both \(Z^{\mathrm{mean}}\) and \(Z^A\) are within the \(Z^{\mathrm{med}}\) sawtooth envelopes, but decrease monotonically with b. We conclude that they are both sensible measures of the expected significance. In the discovery case, \(Z^{\mathrm{mean}}\) is generally slightly more conservative than \(Z^{\mathrm{A}}\), and the reverse is true for the exclusion case. The previously known Asimov approximations \(Z^{\mathrm{CCGV}}_{\mathrm{disc}}\) and \(Z^{\mathrm{KM}}_{\mathrm{excl}}\) of Refs. [2, 3] and [4] are considerably less conservative, lying near the upper edges of the \(Z^{\mathrm{med}}\) sawtooth envelopes.

Not shown in Fig. 1 is \(Z^{p\, \mathrm mean}\), which we find is much lower than all of the others, due to being dominated by unlikely outcomes with large p-values, and therefore not a reasonable measure of the expected significance. Although we do not recommend its use, we note the amusing fact \(Z^{p\, \mathrm mean}_{\mathrm{disc}} = Z^{p\, \mathrm mean}_{\mathrm{excl}}\), the proof of which does not rely on the assumed probability distribution, and so also holds exactly in the case of an uncertain background discussed below.

One sometimes sees \(s/\sqrt{b}\) used as an estimate, but this is much larger than the Z’s shown in Fig. 1, and, as is well-known, is not a good estimate of the significance for discovery or exclusion except when b is large.

We close this section by considering the extreme no-background limit \(b \rightarrow 0\), with varying s. Background predictions much smaller than 1 can realistically come about from extrapolations from other regions. For discovery, if one takes \(b=0\) exactly, then the significance for every pseudo-experiment is either zero (the value we have chosen to assign if no events are observed) or infinite (if even one event is observed). Since any non-zero s would provide a non-zero mean number of events, one obtains \(Z^A = \infty \) for all s in that case. The limit \(b \rightarrow 0\) in Eq. (5) is also seen to give \(Z^{\mathrm{CCGV}} = \infty \). For the median expected significance, we instead get an infinitely large sawtooth. This is because the median number of events is 0 if \(s < \ln (2)\), resulting in \(Z^{\mathrm{med}} = 0\), and is at least 1 for all \(s > \ln (2)\), resulting in \(Z^{\mathrm{med}} = \infty \). Therefore, for the discovery case it is perhaps more interesting to take b extremely small, but non-zero. In Fig. 2a, we show \(Z^{\mathrm{med}}\), \(Z^{\mathrm{mean}}\), \(Z^A\), and \(Z^{\mathrm{CCGV}}\), all for the case \(b= 10^{-6}\), as an example of a non-zero but very small expected background. The median number of events in the pseudo-experiments is \(n=0\) for \(0 < s \le s_1\), and is \(n=1\) for \(s_1 \le s \le s_2\), where \(s_1 \approx \ln (2) \approx 0.693\) and \(s_2 \approx 1.678\) is close to the solution of \(s = \ln (2) + \ln (1+s)\).

In contrast, in the exclusion case there is nothing singular for \(b=0\). In particular, Eq. (6) gives \(Z^{\mathrm{KM}}_{\mathrm{excl}}(b=0) = \sqrt{2s}\). In each exclusion pseudo-experiment, the number of events observed is always \(n=0\) because \(b=0\), so that \(Z^{\mathrm{med}}_{\mathrm{excl}} = Z^{\mathrm{mean}}_{\mathrm{excl}} = Z^{\mathrm{A}}_{\mathrm{excl}} = \sqrt{2} \mathrm{erfc}^{-1} (2 e^{-s})\) are all obtained from \(p_{\mathrm{excl}} = \varGamma (1,s) = e^{-s}\). These results are illustrated in Fig. 2b, which shows that the estimate \(Z^{\mathrm{KM}}_{\mathrm{excl}}\) in this extreme limit is larger than the others. We note if \(b=0\), then \(s>2.996\) is needed to given an expected 95% confidence level exclusion \(Z^{\mathrm{mean}}_{\mathrm{excl}} = Z^{\mathrm{med}}_{\mathrm{excl}} = Z^{\mathrm{A}}_{\mathrm{excl}} > 1.645\).

3 Uncertain background case

More realistically, the expected mean number of background counts can be subject to uncertainties of various sorts. In high-energy physics, the background uncertainty for a future experiment is often dominated by limitations in perturbative theoretical calculations or systematic effects, both of which are unknown (and indeed difficult to rigorously define) but can be roughly estimated or conjectured. There are also statistical uncertainties that will arise from a limited number of events in control or sideband regions. Here, we will consider, in part as a proxy for other types of uncertainties, the “on-off problem” (see for example [7,8,9,10,11,12]), in which the background is estimated by a measurement of m Poisson events in a supposed background-only (off) region. The ratio of the background Poisson mean in this region to the background mean in the signal (on) region is assumed to be a known number \(\tau \). It would also be interesting to consider the case of an uncertainty in \(\tau \) itself, but that is beyond the scope of the present paper. The point estimates for the Poisson mean and the uncertainty of the background in the signal region are then

$$\begin{aligned} {\hat{b}} = m/\tau ,\qquad \varDelta _{{\hat{b}}} = \sqrt{m}/\tau . \end{aligned}$$
(10)

While this Poisson variance is certainly not a rigorous model for systematic or perturbative calculation uncertainties, we propose that it can also be used as a rough proxy for them, in the sense that a proposed estimate for \({\hat{b}}\) and \(\varDelta _{{\hat{b}}}\) can be traded for \((m,\tau )\) in the on-off problem.

We now assign probabilities \(\varDelta P\) to each possible count outcome n in the on region, given m events in the off region, following a hybrid Bayesian-frequentist approach by averaging [10,11,12,13,14] over the possible background means using a Bayesian posterior with a flat prior,

$$\begin{aligned} P(b|m,\tau ) = \tau (\tau b)^m e^{-\tau b}/m!, \end{aligned}$$
(11)

(normalized so that \(\int _0^\infty \! db\> P(b|m,\tau ) = 1\)), from which we then find

$$\begin{aligned}&\varDelta P(n,m,\tau ,s) = \int _0^\infty \!\!\! db\> P(b|m,\tau )\> e^{-(s+b)} \frac{(s + b)^n}{n!} \nonumber \\&\quad = \frac{\tau ^{m+1} e^{-s}}{\varGamma (m+1)\varGamma (n+1)}\int _0^\infty \!\!\! db\> b^m (s+b)^n e^{-b(\tau +1)}\nonumber \\&\quad = \frac{\tau ^{m+1} e^{-s}}{\varGamma (m+1)} \sum _{k=0}^n \frac{s^k}{k!\>(n-k)!} \frac{\varGamma (n-k+m+1)}{(\tau +1)^{n-k+m+1}}. \end{aligned}$$
(12)

Note that here the true background mean b appears only as an integration variable, and that

$$\begin{aligned} \sum _{n=0}^\infty \varDelta P(n,m,\tau ,s) = 1, \end{aligned}$$
(13)

for any \(m,\tau , s\). The limit \(\lim _{\tau \rightarrow \infty } \varDelta P(n,m,\tau ,s)\), with \(m/\tau = {\hat{b}}\) held fixed, recovers the Poisson distribution \(P(n| s+ {\hat{b}})\). In the second equality of Eq. (12), we have written a form valid for non-integer n and m, both to define \(Z^{\mathrm{A}}\) below and to account for the fact that an estimated \({\hat{b}}\) and \(\varDelta _{{\hat{b}}}\) may correspond to non-integer m. The third equality is more useful when n is an integer, and also in the case \(s=0\) where only the \(k=0\) term survives and one can replace n! by \(\varGamma (n+1)\).

The p-value for discovery has two equivalent forms,

$$\begin{aligned}&p_{\mathrm{disc}}(n,m,\tau ) = \sum _{k=n}^{\infty } \varDelta P(k, m, \tau , 0) \nonumber \\&\quad = B(1/(\tau +1), n, m+1)/B(n,m+1), \end{aligned}$$
(14)

where the first form was given in [10,11,12,13] and the second (involving the ordinary and incomplete beta functions) was obtained in a frequentist approach by [8, 9]. Despite appearances, these two forms are equivalent [11, 12], justifying the choice made in Eq. (11).

For exclusion, we find where the first form (following directly from the definition) involves a double sum, the second single-sum form is more efficient if n is an integer, while the last two forms are valid for non-integer nm, have differing ease of numerical evaluation depending on the inputs, and follow from each other by integration by parts.

$$\begin{aligned}&p_{\mathrm{excl}}(n,m,\tau ,s) = \sum _{k=0}^n \varDelta P(k,m,\tau ,s)\nonumber \\&\quad = \sum _{k=0}^n \frac{\tau ^{m+1}}{(\tau +1)^{k+m+1}} \frac{\varGamma (k+m+1) \varGamma (n-k+1,s)}{k!\, \varGamma (m+1) \varGamma (n-k+1)} \nonumber \\&\quad = \frac{\tau ^{m+1}}{\varGamma (n+1) \varGamma (m+1)} \int _0^\infty \!\!\! db\> e^{-\tau b} b^m \varGamma (n+1, s+b) \nonumber \\&\quad = \left[ \varGamma (n+1,s) - e^{-s} \right. \nonumber \\&\qquad \left. \times \int _0^\infty \!\!\! db\> e^{-b} (s+b)^n \varGamma (m+1, \tau b)/\varGamma (m+1)\right] /\varGamma (n+1),\nonumber \\ \end{aligned}$$
(15)
Fig. 3
figure 3

The median, mean, and exact Asimov expected significances for discovery and exclusion, for fixed ratios \(s/{\hat{b}}\) as labeled, as a function of s, for \(\varDelta _{{\hat{b}}}/{\hat{b}} = 0.2\). Here s and \({\hat{b}}\) are assumed to be proportional to their respective cross-sections multiplied by the integrated luminosity \(\int \mathcal{L} dt\) of the experiment, where \(\sigma _s\) is the signal cross-section

We can now consider the expected significances in the case that \({\hat{b}}\) and \(\varDelta _{{\hat{b}}}\) have been fixed, corresponding either to a calculation of the background with limited accuracy, or to a measurement of m for a given \(\tau \). This is done by generating pseudo-experiments for n, distributed according to the probabilities \(\varDelta P(n,m,\tau ,s)\) for discovery and \(\varDelta P(n,m,\tau ,0)\) for exclusion, and then evaluating the p-values according to Eq. (14) for discovery and Eq. (15) for exclusion. As before, we consider \(Z^{\mathrm{med}}\), \(Z^{\mathrm{mean}}\), and \(Z^{\mathrm{A}}\) obtained from the allowed pseudo-experiment data, each as functions of \(s,{\hat{b}}, \varDelta _{{\hat{b}}}\). Here, \(Z^{\mathrm{A}}\) is obtained by replacing n by its mean expected values. For the discovery and exclusion cases respectively, we find these are

$$\begin{aligned} \langle n_{\mathrm{disc}}\rangle= & {} s + {\widetilde{b}} , \end{aligned}$$
(16)
$$\begin{aligned} \langle n_{\mathrm{excl}} \rangle= & {} {\widetilde{b}} , \end{aligned}$$
(17)

where

$$\begin{aligned} {\widetilde{b}}= & {} (m+1)/\tau = {\hat{b}} + \varDelta _{{\hat{b}}}^2/{\hat{b}} . \end{aligned}$$
(18)

Then

$$\begin{aligned} p^{\mathrm{Asimov}}_{\mathrm{disc}}(s,{\hat{b}}, \varDelta _{{\hat{b}}})= & {} p_{\mathrm{disc}}(\langle n_{\mathrm{disc}}\rangle , m, \tau ) , \end{aligned}$$
(19)
$$\begin{aligned} p^{\mathrm{Asimov}}_{\mathrm{excl}}(s,{\hat{b}}, \varDelta _{{\hat{b}}})= & {} p_{\mathrm{excl}}(\langle n_{\mathrm{excl}}\rangle , m, \tau , s) , \end{aligned}$$
(20)

which are converted to \(Z^{\mathrm{A}}_{\mathrm{disc}}\) and \(Z^{\mathrm{A}}_{\mathrm{excl}}\) as usual.

Note that the mean expected event count in the absence of signal, \({\widetilde{b}}\), is distinct from, and larger than, the measured background estimate, \({\hat{b}} = m/\tau \). The fact that \({\widetilde{b}} > {\hat{b}}\) can be understood heuristically as the statement that, for finite \(\tau \), a given m is more likely to have been a downward rather than upward fluctuation. As an extreme example, if \(m=0\), this could be a downward fluctuation of a non-zero true background, but obviously it could not be an upward one. Given \((m,\tau )\), depending on the experimental situation there may be other justifiable probability density functions besides Eq. (11), and the subsequent discussion carries through similarly for any other choice. If we had chosen a different Bayesian distribution in Eq. (11), then the expression for \({\widetilde{b}}\) (in terms of m and \(\tau \)) would change. For this reason, we prefer to give results directly in terms of the independent variable \({\hat{b}} = m/\tau \) corresponding to the direct measurement (or calculation) of the background, rather than \({\widetilde{b}}\).

Fig. 4
figure 4

The expected significance measures \(Z^{\mathrm{A}}_{\mathrm{disc}}\) and \(Z^{\mathrm{A}}_{\mathrm{excl}}\), for fixed ratios \(s/{\hat{b}}\) as labeled, as a function of \(s = \sigma _s \int \mathcal{L} dt\), for \(\varDelta _{{\hat{b}}}/{\hat{b}} = 0, 0.2,\) and 0.5, as labeled. For discovery we show \(s/{\hat{b}} = 1, 10, 100\), and for exclusion \(s/{\hat{b}} = 0.5\) and 5. The shaded areas are the envelopes between the largest and smallest values of \(\varDelta _{{\hat{b}}}/{{\hat{b}}}\), for each \(s/{\hat{b}}\)

References [3] and [4] had earlier provided Asimov approximations to the median discovery and exclusion significances, respectively. Equations (5) and (6) above are the limits as \(\varDelta _b \rightarrow 0\). However, the significance estimates defined in Refs. [3] and [4] are not directly comparable to our definitions when \(\varDelta _b \not =0\), since they take the (unknown) true background mean b as input, rather than the point estimate \({\hat{b}} = m/\tau \) as we do here. If one ignores the distinction and considers \(b = {\hat{b}}\), then \(Z^{\mathrm{A}}_{\mathrm{disc}}\) and \(Z^{\mathrm{A}}_{\mathrm{excl}}\) as defined in this paper give more conservative significances than those obtained from [3, 4].

Results for \(Z^{\mathrm{med}}\), \(Z^{\mathrm{mean}}\), and \(Z^{\mathrm{A}}\) for discovery and exclusion are shown in Fig. 3 for \(\varDelta _{{\hat{b}}}/{\hat{b}} = 0.2\), this time for s and \({\hat{b}}\) both taken proportional to an integrated luminosity factor \(\int \mathcal{L} dt\) which represents the temporal progress of the experiment. We consider fixed ratios \(s/{\hat{b}} = 2, 10, 100\) for discovery and 0.5, 5 for exclusion. Again, the sawtooth behavior of \(Z^{\mathrm{med}}\) is evident, while \(Z^{\mathrm{mean}}\) and \(Z^{\mathrm{A}}\) both lie within or near its envelope, and can be taken as reasonable and monotonic measures of the expected discovery and exclusion capabilities. Note that \(Z^{\mathrm{A}}_{\mathrm{excl}}\) is more conservative than \(Z^{\mathrm{med}}_{\mathrm{excl}}\) or \(Z^{\mathrm{mean}}_{\mathrm{excl}}\) for higher integrated luminosities, while \(Z^{\mathrm{mean}}\) is slightly more conservative for discovery. As before, \(Z^{p\>\mathrm{mean}}_{\mathrm{disc}} = Z^{p\>\mathrm{mean}}_{\mathrm{excl}}\), not shown, gives far smaller values and cannot be recommended. In Fig. 4, we show \(Z^{\mathrm{A}}_{\mathrm{disc}}\) and \(Z^{\mathrm{A}}_{\mathrm{excl}}\) for \(\varDelta _{{\hat{b}}}/{\hat{b}} = 0, 0.2,\) and 0.5. Consistent with intuition, increasing the background uncertainty reduces the expected significances, with a much greater impact when \(s/{\hat{b}}\) is smaller.

4 Conclusion

In this paper, we have critically examined the use of median expected significance \(Z^{\mathrm{med}}\) and possible alternatives. We find that either \(Z^{\mathrm{mean}}\) or \(Z^{\mathrm{A}}\) as defined and evaluated above would be reasonable measures of the discovery and exclusion capabilities of counting experiments with known or uncertain backgrounds. They both give results that are similar to \(Z^{\mathrm{med}}\), but are monotonic in the expected way with respect to changes in background and signal means and background uncertainties. They are also considerably more conservative than previous Asimov approximations, especially when the background is small. The exclusion case with low event counts, where the sawtooth behavior of \(Z^{\mathrm{med}}_{\mathrm{excl}}\) is particularly prominent and problematic, is noteworthy, as the success of the Standard Model of particle physics suggests the future importance of limit-setting capabilities for experimental signals with small rates including rare decays, non-standard interactions, new heavy particle production, and dark matter searches. In this paper, we have not considered the effects of uncertainty in the number of predicted signal events; this could be an interesting and important subject of future investigations.

In comparing \(Z^{\mathrm{mean}}\) and \(Z^{\mathrm{A}}\), we note that there is no “correct” measure of the expected significance, since the various Z definitions are simply different answers to different questions. The \(Z^{\mathrm{A}}\) measure is typically slightly less conservative in evaluating discovery, and more conservative for exclusion prospects, than \(Z^{\mathrm{mean}}\). It may be simpler to extend \(Z^{\mathrm{A}}\) to the case of experiments that feature more complex statistics than just integer counts of events. Also, the \(Z^{\mathrm{A}}\) measure, based on the means of the data distributions, is not harder to evaluate than other estimates of Z, provided that the probability distributions are known analytically or numerically. In the counting experiments considered here, the evaluations of \(Z^{\mathrm{A}}_{\mathrm{disc}}\) and \(Z^{\mathrm{A}}_{\mathrm{excl}}\) require only directly plugging into Eqs. (8)–(9) for a known background, or Eqs. (10) and (14)–(20) for an uncertain background. For these reasons, we advocate that \(Z^{\mathrm{A}}\) be the standard significance measure for projected exclusions and discovery sensitivities in counting experiments.