1 Introduction

Unbinned maximum likelihood fits are an essential tool for parameter estimation in high energy physics, due to the desirable features of the maximum likelihood estimator. In the asymptotic limit the maximum likelihood estimator is normally distributed around the true parameter value and its variance is equal to the minimum variance bound [1, 2]. Furthermore, in the unbinned approach no information is lost due to binning.

The inclusion of weights into the maximum likelihood formalism is desirable in many applications. Examples are the statistical subtraction of background events in the sPlot formalism [3] through the use of per-event weights, and the correction of acceptance effects via weighting by the inverse efficiency. However, with the inclusion of per-event weights the confidence intervals determined by the inverse second derivative of the negative logarithmic likelihood (in the multidimensional case the inverse of the Hessian matrix of the negative logarithmic likelihood) are no longer asymptotically correct.Footnote 1 There are several approaches that are commonly used to determine confidence intervals in the presence of event weights. However, as will be shown below, not all of these techniques are guaranteed to give asymptotically correct coverage. In this paper, the asymptotically correct expressions for the determination of parameter uncertainties will be derived and then compared with these approaches.

This paper is structured as follows: In Sect. 2 the unbinned maximum likelihood formalism is briefly summarised and the inclusion of event weights is discussed. The asymptotically correct expression for static event weights is derived in Sect. 2.1 and compared with several commonly used approaches to determine uncertainties in weighted maximum likelihood fits in Sect. 2.2. Section 3.1 details the correction of acceptance effects through event weights and discusses the impact of weight uncertainties in this context. Section 3.2 derives the asymptotically correct expressions for parameter uncertainties from fits of sWeighted data, and also details the impact of potential nuisance parameters present in the determination of sWeights. Different approaches to the determination of parameter uncertainties are compared and contrasted using two specific examples in Sect. 4, an angular fit correcting for an acceptance effect (Sect. 4.1) and the determination of a lifetime when statistically subtracting background events using sWeights (Sect. 4.2). Finally, conclusions are given in Sect. 5.

2 Unbinned maximum likelihood fits and event weights

The maximum likelihood estimator for a set of \(N_P\) parameters \(\varvec{\lambda }=\left\{ \lambda _{1},\ldots ,\lambda _{N_P}\right\} \), given N independent and identically distributed measurements \(\varvec{x}=\left\{ x_1,\ldots ,x_N\right\} \), is determined by solving (typically numerically using a software package like Minuit [5]) the maximum likelihood condition

$$\begin{aligned} \left. \frac{\partial }{\partial \lambda _j}\ln {{\mathcal {L}}}(x_1,\ldots ,x_N;\varvec{\lambda })\right| _{\hat{\varvec{\lambda }}}&= 0\nonumber \\ \sum _{e=1}^N\left. \frac{\partial }{\partial \lambda _j} \ln {{\mathcal {P}}}(x_e;\varvec{\lambda })\right| _{\hat{\varvec{\lambda }}}&= 0, \end{aligned}$$
(1)

where \({{\mathcal {P}}}(x_e;\varvec{\lambda })\) denotes the probability density function evaluated for the event \(x_e\) and parameters \(\varvec{\lambda }\). Maximising the logarithmic likelihood \(\ln {{\mathcal {L}}}\) finds the parameters \(\hat{\varvec{\lambda }}\) for which the measured data \(\varvec{x}\) becomes the most likely. The covariance matrix \(\varvec{C}\) for the parameters in the absence of event weights can be calculated from the inverse matrix of second derivatives (the Hessian matrix) of the negative logarithmic likelihood

$$\begin{aligned} C_{ij}&= -\left. \left( \frac{\partial ^2}{\partial \lambda _i\partial \lambda _j} \ln {{\mathcal {L}}}(x_1,\ldots ,x_N;\varvec{\lambda })\right| _{\hat{\varvec{\lambda }}}\right) ^{-1}\nonumber \\&= -\left. \left( \sum _{e=1}^N\frac{\partial ^2}{\partial \lambda _i\partial \lambda _j} \ln {{\mathcal {P}}}(x_e;\varvec{\lambda })\right| _{\hat{\varvec{\lambda }}}\right) ^{-1}. \end{aligned}$$
(2)

evaluated at \(\varvec{\lambda }=\hat{\varvec{\lambda }}\). When including event weights \(w_{e=1,\ldots ,N}\) to give each measurement a specific weight the maximum likelihood condition becomesFootnote 2

$$\begin{aligned} \sum _{e=1}^N\left. w_e\frac{\partial }{\partial \lambda _j} \ln {{\mathcal {P}}}(x_e;\varvec{\lambda })\right| _{\hat{\varvec{\lambda }}}&= 0. \end{aligned}$$
(3)

Depending on the application, the weight \(w_e\) can be a function of \(x_e\) and other event quantities \(y_e\) that \({{\mathcal {P}}}(x_e;\varvec{\lambda })\) does not depend on directly. For efficiency corrections the weights are given by \(w_e(x_e,y_e)=1/\epsilon (x_e,y_e)\) as detailed in Sect. 3.1, for sWeights the weights \(w_e(y_e)\) depend on the discriminating variable y as described in Sect. 3.2. It should be noted, that the weighted inverse Hessian matrix

$$\begin{aligned} C_{ij}&= -\left. \left( \sum _{e=1}^N w_e\frac{\partial ^2}{\partial \lambda _i\partial \lambda _j} \ln {{\mathcal {P}}}(x_e;\varvec{\lambda })\right| _{\hat{\varvec{\lambda }}}\right) ^{-1}. \end{aligned}$$
(4)

will generally not give asymptotically correct confidence intervals. This can be most easily seen when assuming constant weights \(w_e=w\) which will result in an over-estimation (\(w>1\)) or under-estimation (\(w<1\)) of the statistical power of the sample and confidence intervals that thus under- or overcover.

2.1 Asymptotically correct uncertainties in the presence of per-event weights

To derive the parameter variance in the presence of event weights, which for now are considered to be static, the simple case of a single parameter \(\lambda \) is discussed first. In this case, the estimator \({\hat{\lambda }}\) is defined implicitly by the condition

$$\begin{aligned} \sum _{e=1}^N w_e \left. \frac{\partial \ln {{\mathcal {P}}}(x_e;\lambda )}{\partial \lambda }\right| _{{\hat{\lambda }}}&= 0 \end{aligned}$$
(5)

which is referred to as an estimating equation in the statistical literature. A central prerequisite for the following derivation is the propertyFootnote 3

$$\begin{aligned} E\biggl ( \sum _{e=1}^N w_e \left. \frac{\partial \ln {{\mathcal {P}}}(x_e;\lambda )}{\partial \lambda }\right| _{\lambda _0}\biggr )&= 0, \end{aligned}$$
(6)

which is shown for event weights to correct an acceptance effect in Sect. 3.1 (Eq. 29). The more complex case of sWeights will be discussed in Sect. 3.2 (see Eqs. 4244). Using the fact that \(E( w\partial ^2 \ln {{\mathcal {P}}}/\partial \lambda ^2|_{\lambda })<0\) (see Eq. 30) it can be shown that the estimator \({\hat{\lambda }}\) defined by Eq. 5 is consistent [6]. We can then Taylor-expand Eq. 5 to first order around the (unknown) true value \(\lambda _0\), to which \({\hat{\lambda }}\) converges in the asymptotic limit of large N:

$$\begin{aligned} \sum _{e=1}^N w_e \left. \frac{\partial \ln {{\mathcal {P}}}(x_e;\lambda )}{\partial \lambda }\right| _{\lambda _0} {+} \left( {\hat{\lambda }}{-}\lambda _0\right) \sum _{e=1}^N w_e \left. \frac{\partial ^2 {\ln } {{\mathcal {P}}}(x_e;\lambda )}{\partial \lambda ^2}\right| _{\lambda _0} = 0. \end{aligned}$$
(7)

This equation can be rewritten as

$$\begin{aligned} {\hat{\lambda }}-\lambda _0&= -\frac{\sum _{e=1}^N w_e \left. \frac{\partial \ln {{\mathcal {P}}}(x_e;\lambda )}{\partial \lambda }\right| _{\lambda _0}}{\sum _{e=1}^N w_e \left. \frac{\partial ^2 \ln {{\mathcal {P}}}(x_e;\lambda )}{\partial \lambda ^2}\right| _{\lambda _0}} \nonumber \\&= -\frac{\sum _{e=1}^N w_e \left. \frac{\partial \ln {{\mathcal {P}}}(x_e;\lambda )}{\partial \lambda }\right| _{\lambda _0}}{N E\Bigl ( w \left. \frac{\partial ^2 \ln {{\mathcal {P}}}(x_e;\lambda )}{\partial \lambda ^2}\right| _{\lambda _0}\Bigr )} + {{\mathcal {O}}}(1/N), \end{aligned}$$
(8)

giving the deviation of the estimator \({\hat{\lambda }}\) from the true value \(\lambda _0\). Here, we used that the sum in the denominator goes to

$$\begin{aligned} \sum _{e=1}^N w_e \left. \frac{\partial ^2 \ln {{\mathcal {P}}}(x_e;\lambda )}{\partial \lambda ^2}\right| _{\lambda _0}&\rightarrow N E\biggl ( w \left. \frac{\partial ^2\ln {{\mathcal {P}}}(x;\lambda )}{\partial \lambda ^2}\right| _{\lambda _0}\biggr ) \end{aligned}$$
(9)

in the asymptotic limit due to the law of large numbers. Due to the central limit theorem, the numerator converges to a Gaussian distribution with mean zero (according to Eq. 6) and variance

$$\begin{aligned}&N {\mathrm {var}}\left( w\left. \frac{\partial \ln {{\mathcal {P}}}(x;\lambda )}{\partial \lambda }\right| _{\lambda _0}\right) \nonumber \\&\quad = N E\biggl ( w^2 \left. \frac{\partial \ln {{\mathcal {P}}}(x;\lambda )}{\partial \lambda }\right| _{\lambda _0}\left. \frac{\partial \ln {{\mathcal {P}}}(x;\lambda )}{\partial \lambda }\right| _{\lambda _0}\biggr ) . \end{aligned}$$
(10)

Using Eqs. 9 and 10, the variance in the asymptotic limit at leading order is thus given by

$$\begin{aligned} {\mathrm {var}}({\hat{\lambda }}-\lambda _0)&= E\left( ({\hat{\lambda }}-\lambda _0)^2 \right) \nonumber \\&= \frac{E\left( w^2\left. \frac{\partial \ln {{\mathcal {P}}}(x;\lambda )}{\partial \lambda }\right| _{\lambda _0}\left. \frac{\partial \ln {{\mathcal {P}}}(x;\lambda )}{\partial \lambda }\right| _{\lambda _0}\right) }{N E\left( w\left. \frac{\partial ^2\ln {{\mathcal {P}}}(x;\lambda )}{\partial \lambda ^2} \right| _{\lambda _0}\right) ^{2}}. \end{aligned}$$
(11)

The right-hand side of Eq. 11 is the inverse Godambe information [7, 8], which is central to the theory of estimating equations. As the estimator is consistent, we replace \(\lambda _0\) with \({\hat{\lambda }}\) in the asymptotic limit, and further estimate the expectation values through the sample, resulting in

$$\begin{aligned}&{\mathrm {var}}({\hat{\lambda }}-\lambda _0)\nonumber \\&\quad = \frac{\sum _{e=1}^N w_e^2 \left. \frac{\partial \ln {{\mathcal {P}}}(x_e;\lambda )}{\partial \lambda }\right| _{{\hat{\lambda }}}\left. \frac{\partial \ln {{\mathcal {P}}}(x_e;\lambda )}{\partial \lambda }\right| _{{\hat{\lambda }}}}{\left( \sum _{e=1}^N w_e\left. \frac{\partial ^2 \ln {{\mathcal {P}}}(x_e;\lambda )}{\partial \lambda ^2}\right| _{{\hat{\lambda }}} \right) \left( \sum _{e=1}^N w_e\left. \frac{\partial ^2 \ln {{\mathcal {P}}}(x_e;\lambda )}{\partial \lambda ^2}\right| _{{\hat{\lambda }}}\right) }. \end{aligned}$$
(12)

This expression is also known as the sandwich estimator. In the case where event weights are absent (\(w_e=1\)), the numerator in Eq. 11 cancels with one of the inverse Hessian matrices as in this case

$$\begin{aligned} E\biggl ( \left. \frac{\partial \ln {{\mathcal {P}}}(x;\lambda )}{\partial \lambda }\right| _{\lambda _0} \left. \frac{\partial \ln {{\mathcal {P}}}(x;\lambda )}{\partial \lambda }\right| _{\lambda _0} \biggr )&= -E\biggl ( \left. \frac{\partial ^2 \ln {{\mathcal {P}}}(x;\lambda )}{\partial \lambda ^2}\right| _{\lambda _0} \biggr ). \end{aligned}$$
(13)

For \(w_e=1\) the Godambe information thus simplifies to the well known Fisher information.

For the multidimensional case we analogously Taylor-expand Eq. 3 to first order, resulting in

$$\begin{aligned}&\sum _{e=1}^N w_e \left. \frac{\partial }{\partial \lambda _i}\ln {{\mathcal {P}}}(x_e;\varvec{\lambda })\right| _{\varvec{\lambda }_0} + \sum _{e=1}^N w_e \sum _{j=1}^{N_P} ({\hat{\lambda }}_j-\lambda _{0j})\nonumber \\&\quad \times \left. \frac{\partial ^2}{\partial \lambda _j\partial \lambda _i}\ln {{\mathcal {P}}}(x_e;\varvec{\lambda })\right| _{\varvec{\lambda }_0} = 0, \end{aligned}$$
(14)

which can be written as a matrix equation

$$\begin{aligned} \left. d_i\right| _{\varvec{\lambda }_0}&= - \sum _{j=1}^{N_P} \left. {H}_{ij}\right| _{\varvec{\lambda }_0}\left( {\hat{\lambda }}_j-\lambda _{0j}\right) ~~~\text {with the definitions}\nonumber \\ d_i&= \sum _{e=1}^N w_e\frac{\partial \ln {{\mathcal {P}}}(x_e;\varvec{\lambda })}{\partial \lambda _i} ~~~\text {and}\nonumber \\ {H}_{ij}&= \sum _{e=1}^N w_e\frac{\partial ^2}{\partial \lambda _i\partial \lambda _j} \ln {{\mathcal {P}}}(x_e;\varvec{\lambda }). \end{aligned}$$
(15)

Matrix inversion yields an expression for the deviation of the estimator \({\hat{\lambda }}_i\) from the true value \(\lambda _{0i}\)

$$\begin{aligned} {\hat{\lambda }}_i-\lambda _{0i}&= -\sum _{j=1}^{N_P}\left. {H}_{ij}^{-1} d_j\right| _{\varvec{\lambda }_0}. \end{aligned}$$
(16)

The covariance matrix \(\varvec{C}\) is then given by

$$\begin{aligned} {C}_{ij}&= E\left( {\hat{\lambda }}_i-\lambda _{0i})({\hat{\lambda }}_j-\lambda _{0j})\right) \nonumber \\&= \sum _{k,l=1}^{N_P} \left. {H}^{-1}_{ik} E(d_k d_l) {H}^{-1}_{lj}\right| _{\hat{\varvec{\lambda }}}\nonumber \\&= \sum _{k,l=1}^{N_P} \left. {H}^{-1}_{ik} \left( \sum _{e=1}^N w_e^2\frac{\partial \ln {{\mathcal {P}}}(x_e;\varvec{\lambda })}{\partial \lambda _k}\frac{\partial \ln {{\mathcal {P}}}(x_e;\varvec{\lambda })}{\partial \lambda _l}\right) {H}^{-1}_{lj}\right| _{\hat{\varvec{\lambda }}}, \end{aligned}$$
(17)

which can be compactly written as

$$\begin{aligned} {C}_{ij}&= \sum _{k,l=1}^{N_P} \left. {H}^{-1}_{ik} {D}_{kl} {H}^{-1}_{lj}\right| _{\hat{\varvec{\lambda }}} \quad \text {with}\nonumber \\ {D}_{kl}&= \sum _{e=1}^N w_e^2\frac{\partial \ln {{\mathcal {P}}}(x_e;\varvec{\lambda })}{\partial \lambda _k}\frac{\partial \ln {{\mathcal {P}}}(x_e;\varvec{\lambda })}{\partial \lambda _l}. \end{aligned}$$
(18)

The above expressions are familiar from the derivation of Eq. 2 (in the absence of event weights) in standard textbooks (e.g. Ref. [9]). Equation 18 has been previously discussed in Ref. [4] in the context of event weights for efficiency correction. However, it does not seem to be commonly used and often one of the approaches detailed below in Sect. 2.2 is employed instead.

2.2 Commonly used approaches to uncertainties in weighted fits

Instead of using the asymptotically correct approach for static event weights given by Eq. 18, often other techniques are used to determine parameter uncertainties in weighted unbinned maximum likelihood fits which are presented below. We stress that of the techniques (a)–(c) listed below only the bootstrapping approach (c) will result in generally asymptotically correct uncertainties.

  1. (a)

    A simple approach used sometimes (e.g. in Ref. [10]) is to rescale the weights \(w_e\) according to

    $$\begin{aligned} w_e^\prime&= w_e\frac{\sum _{e=1}^N w_e}{\sum _{e=1}^N w_e^2} \end{aligned}$$
    (19)

    and to use Eq. 4 with the weights \(w_e^\prime \). This will rescale the weights such that their sum corresponds to Kish’s effective sample size [11], however, this approach will not generally reproduce the result in Eq. 18.

  2. (b)

    A method proposed in Refs. [1, 2] is to correct the covariance matrix according to

    $$\begin{aligned} {C}_{ij}^\prime&= \sum _{k,l=1}^{N_P} \left. {H}_{ik}^{-1} {W}_{kl} {H}_{lj}^{-1}\right| _{\hat{\varvec{\lambda }}}, \end{aligned}$$
    (20)

    where \(\varvec{H}\) is the weighted Hessian matrix defined in Eq. 15 and \(\varvec{W}\) is the Hessian matrix determined using squared weights \(w_e^2\) according to

    $$\begin{aligned} {W}_{kl}&= -\sum _{e=1}^N w_e^2\frac{\partial ^2\ln {{\mathcal {P}}}(x_e;\varvec{\lambda })}{\partial \lambda _k\partial \lambda _l}. \end{aligned}$$
    (21)

    This method is the nominal method used in the Roofit software package when using weighted events [12] and is thus widely used in particle physics. It corresponds to the result in Eq. 18 only if

    $$\begin{aligned}&E\biggl ( \sum _{e=1}^N w_e^2 \left. \frac{\partial \ln {{\mathcal {P}}}(x_e;\varvec{\lambda })}{\partial \lambda _k}\right| _{\hat{\varvec{\lambda }}} \left. \frac{\partial \ln {{\mathcal {P}}}(x_e;\varvec{\lambda })}{\partial \lambda _l}\right| _{\hat{\varvec{\lambda }}} \biggr ) \nonumber \\&\quad = -E\biggl ( \sum _{e=1}^N w_e^2 \left. \frac{\partial ^2\ln {{\mathcal {P}}}(x_e;\varvec{\lambda })}{\partial \lambda _k\partial \lambda _l}\right| _{\hat{\varvec{\lambda }}}\biggr ). \end{aligned}$$
    (22)

    This is however not generally the case. This becomes more clear when rewriting the left- and right-hand side of Eq. 22 according to

    $$\begin{aligned}&\sum _{e=1}^N w_e^2 \left. \frac{\partial \ln {{\mathcal {P}}}(x_e;\varvec{\lambda })}{\partial \lambda _k}\right| _{\hat{\varvec{\lambda }}} \left. \frac{\partial \ln {{\mathcal {P}}}(x_e;\varvec{\lambda })}{\partial \lambda _l}\right| _{\hat{\varvec{\lambda }}}\nonumber \\&= \sum _{e=1}^N \frac{w_e^2}{{{\mathcal {P}}}^2(x_e;\varvec{\lambda })} \left. \frac{\partial {{\mathcal {P}}}(x_e;\varvec{\lambda })}{\partial \lambda _k}\right| _{\hat{\varvec{\lambda }}} \left. \frac{\partial {{\mathcal {P}}}(x_e;\varvec{\lambda })}{\partial \lambda _l}\right| _{\hat{\varvec{\lambda }}}\quad \mathrm{and} \end{aligned}$$
    (23)
    $$\begin{aligned}&\qquad -\sum _{e=1}^N w_e^2 \left. \frac{\partial ^2\ln {{\mathcal {P}}}(x_e;\varvec{\lambda })}{\partial \lambda _k\partial \lambda _l}\right| _{\hat{\varvec{\lambda }}}\nonumber \\&\quad = \sum _{e=1}^N \frac{w_e^2}{{{\mathcal {P}}}^2(x_e;\varvec{\lambda })} \left. \frac{\partial {{\mathcal {P}}}(x_e;\varvec{\lambda })}{\partial \lambda _k}\right| _{\hat{\varvec{\lambda }}} \left. \frac{\partial {{\mathcal {P}}}(x_e;\varvec{\lambda })}{\partial \lambda _l}\right| _{\hat{\varvec{\lambda }}}\nonumber \\&\qquad -\sum _{e=1}^N\frac{w_e^2}{{{\mathcal {P}}}(x_e;\varvec{\lambda })} \left. \frac{\partial ^2 {{\mathcal {P}}}(x_e;\varvec{\lambda })}{\partial \lambda _k\partial \lambda _l}\right| _{\hat{\varvec{\lambda }}}. \end{aligned}$$
    (24)

    The expectation value of the second part on the right-hand side of Eq. 24 is not generally zero. While Refs. [1, 2] correctly derive that the expectation value

    $$\begin{aligned} E\biggl (\frac{w}{{{\mathcal {P}}}(x;\varvec{\lambda })}\frac{\partial ^2{{\mathcal {P}}}(x;\varvec{\lambda })}{\partial \lambda _k\partial \lambda _l}\biggr )&= 0 \end{aligned}$$
    (25)

    for an efficiency correction \(\epsilon _e=1/w_e\), this is not generally the case for the expression with squared weights

    $$\begin{aligned} E\biggl (\frac{w^2}{{{\mathcal {P}}}(x;\varvec{\lambda })}\frac{\partial ^2{{\mathcal {P}}}(x;\varvec{\lambda })}{\partial \lambda _k\partial \lambda _l}\biggr ) \end{aligned}$$
    (26)

    resulting in confidence intervals that are not generally asymptotically correct when using this approach. This will be detailed for efficiency corrections in Sect. 3.1. For the specific example discussed in Sect. 4.1, the corresponding expectation values are calculated explicitly in Appendix A.

  3. (c)

    A general approach for the determination of parameter uncertainties is to bootstrap the data [13]. Repeatedly resampling the data set with replacement allows new samples to be generated that can in turn be used to estimate the parameters \(\varvec{\lambda }\) using the maximum likelihood method. The width of the distribution of estimated parameter values can then be used as estimator for the parameter uncertainty. This approach is generally valid, however repeatedly (typically \({{\mathcal {O}}}(10^3)\) times) solving Eq. 3 numerically can be computationally expensive and thus this approach is often unfeasible.

3 Event weights and inclusion of weight uncertainties

3.1 Acceptance corrections

Following the notation in Refs. [1, 2], this section details the correction of acceptance effects using event weights. Acceptance of events with a certain probability \(\epsilon \), depending on the measurements \(x_e\) and \(y_e\), can be accounted for in unbinned maximum likelihood fits by using event weights \(w_e=1/\epsilon (x_e,y_e)\) in Eq. 3. The efficiency \(\epsilon (x,y)\) should be positive over the full phasespace considered, regions of phasespace where the efficiency is zero should be excluded from the analysis.Footnote 4 Here, we differentiate between the quantities x that the probability density function \({{\mathcal {P}}}(x;\varvec{\lambda })\) in Eq. 3 depends on directly, and potential additional quantities y that can depend on x. Using event weights can be advantageous when it is difficult or computationally expensive to determine the norm of the probability density function when including the efficiency as an explicit additional multiplicative factor \(\epsilon (x,y)\). The covariance in this case can be estimated using Eq. 18 as previously suggested in Ref. [4].

To determine expectation values it is necessary to include the acceptance effect in the probability density function. The probability density function \({{\mathcal {P}}}(x,y;\varvec{\lambda })\) gives the probability to find the measurements x and y depending on the parameters \(\varvec{\lambda }\) with

$$\begin{aligned} {{\mathcal {P}}}(x,y;\varvec{\lambda })&= {{\mathcal {P}}}(x;\varvec{\lambda }){{\mathcal {Q}}}(y;x) \end{aligned}$$
(27)

and the proper normalisation \(\int {{\mathcal {P}}}(x;\varvec{\lambda }){\text {d}}x=1\) and \(\int {{\mathcal {Q}}}(y;x){\text {d}}y=1\). The resulting total probability density function including the acceptance effect is then given by

$$\begin{aligned} {{\mathcal {G}}}(x,y;\varvec{\lambda })&= \frac{{{\mathcal {P}}}(x;\varvec{\lambda }){{\mathcal {Q}}}(y;x)\epsilon (x,y)}{\int {{\mathcal {P}}}(x;\varvec{\lambda }){{\mathcal {Q}}}(y;x)\epsilon (x,y){\text {d}}x{\text {d}}y} \nonumber \\&= {{\mathcal {P}}}(x;\varvec{\lambda }){{\mathcal {Q}}}(y;x)\epsilon (x,y)/{{\mathcal {N}}} \end{aligned}$$
(28)

with normalisation \({{\mathcal {N}}}\). This is the probability density function that needs to be used when determining expectation values. For the likelihood condition we find

$$\begin{aligned}&E\biggl ( w(x,y)\frac{\partial \ln {{\mathcal {P}}}(x;\varvec{\lambda })}{\partial \lambda _j} \biggr ) \nonumber \\&\quad = E\biggl ( w(x,y)\frac{1}{{{\mathcal {P}}}(x;\varvec{\lambda })}\frac{\partial {{\mathcal {P}}}(x;\varvec{\lambda })}{\partial \lambda _j}\biggr )\nonumber \\&\quad = \int \frac{1}{\epsilon (x,y)} \frac{1}{{{\mathcal {P}}}(x;\varvec{\lambda })} \frac{\partial {{\mathcal {P}}}(x;\varvec{\lambda })}{\partial \lambda _j} {{\mathcal {P}}}(x;\varvec{\lambda }){{\mathcal {Q}}}(y;x)\epsilon (x,y){\text {d}}x{\text {d}}y /{{\mathcal {N}}}\nonumber \\&\quad = \frac{\partial }{\partial \lambda _j}\int {{\mathcal {P}}}(x;\varvec{\lambda }){{\mathcal {Q}}}(y;x){\text {d}}x{\text {d}}y /{{\mathcal {N}}} = \frac{\partial }{\partial \lambda _j} 1 /{{\mathcal {N}}} =0, \end{aligned}$$
(29)

confirming the central property of Eq. 6. Further, we obtain

$$\begin{aligned}&E\biggl ( w(x,y)\frac{\partial ^2\ln {{\mathcal {P}}}(x;\varvec{\lambda })}{\partial \lambda _i\partial \lambda _j} \biggr ) \nonumber \\&\quad = E\biggl ( w(x,y) \frac{1}{{{\mathcal {P}}}(x;\varvec{\lambda })}\frac{\partial ^2 {{\mathcal {P}}}(x;\varvec{\lambda })}{\partial \lambda _i\partial \lambda _j}\biggr )\nonumber \\&\qquad - E\biggl ( w(x,y) \frac{1}{{{\mathcal {P}}}^2(x;\varvec{\lambda })}\frac{\partial {{\mathcal {P}}}(x;\varvec{\lambda })}{\partial \lambda _i} \frac{\partial {{\mathcal {P}}}(x;\varvec{\lambda })}{\partial \lambda _j}\biggr )\nonumber \\&\quad = \int \frac{1}{\epsilon (x,y)} \frac{1}{{{\mathcal {P}}}(x;\varvec{\lambda })} \frac{\partial ^2 {{\mathcal {P}}}(x;\varvec{\lambda })}{\partial \lambda _i\partial \lambda _j} {{\mathcal {P}}}(x;\varvec{\lambda }){{\mathcal {Q}}}(y;x)\epsilon (x,y){\text {d}}x{\text {d}}y /{{\mathcal {N}}}\nonumber \\&\qquad -\int \frac{1}{\epsilon (x,y)} \frac{1}{{{\mathcal {P}}}^2(x;\varvec{\lambda })} \frac{\partial {{\mathcal {P}}}(x;\varvec{\lambda })}{\partial \lambda _i}\nonumber \\&\qquad \times \frac{\partial {{\mathcal {P}}}(x;\varvec{\lambda })}{\partial \lambda _j} {{\mathcal {P}}}(x;\varvec{\lambda }){{\mathcal {Q}}}(y;x)\epsilon (x,y){\text {d}}x{\text {d}}y /{{\mathcal {N}}}\nonumber \\&\quad = -\int \frac{1}{{{\mathcal {P}}}(x;\varvec{\lambda })} \frac{\partial {{\mathcal {P}}}(x;\varvec{\lambda })}{\partial \lambda _i}\frac{\partial {{\mathcal {P}}}(x;\varvec{\lambda })}{\partial \lambda _j} {\text {d}}x /{{\mathcal {N}}}, \end{aligned}$$
(30)

where also the expectation value in Eq. 25 is shown, and

$$\begin{aligned}&E\biggl ( w(x,y)\frac{\partial \ln {{\mathcal {P}}}(x;\varvec{\lambda })}{\partial \lambda _i}\frac{\partial \ln {{\mathcal {P}}}(x;\varvec{\lambda })}{\partial \lambda _j} \biggr ) \nonumber \\&\quad = E\biggl ( w(x,y) \frac{1}{{{\mathcal {P}}}^2(x;\varvec{\lambda })}\frac{\partial {{\mathcal {P}}}(x;\varvec{\lambda })}{\partial \lambda _i}\frac{\partial {{\mathcal {P}}}(x;\varvec{\lambda })}{\partial \lambda _j}\biggr )\nonumber \\&\quad = \int \frac{1}{\epsilon (x,y)} \frac{1}{{{\mathcal {P}}}^2(x;\varvec{\lambda })} \frac{\partial {{\mathcal {P}}}(x;\varvec{\lambda })}{\partial \lambda _i}\nonumber \\&\qquad \times \frac{\partial {{\mathcal {P}}}(x;\varvec{\lambda })}{\partial \lambda _j} {{\mathcal {P}}}(x;\varvec{\lambda }){{\mathcal {Q}}}(y;x)\epsilon (x,y){\text {d}}x{\text {d}}y /{{\mathcal {N}}}\nonumber \\&\quad = -E\biggl ( w(x,y)\frac{\partial ^2\ln {{\mathcal {P}}}(x;\varvec{\lambda })}{\partial \lambda _i\partial \lambda _j} \biggr ). \end{aligned}$$
(31)

However, the equality derived above is not generally fulfilled for squared weights. In this case, we find

$$\begin{aligned}&E\biggl ( w^2(x,y)\frac{\partial ^2\ln {{\mathcal {P}}}(x;\varvec{\lambda })}{\partial \lambda _i\partial \lambda _j} \biggr ) \nonumber \\&\quad = E\biggl ( w^2(x,y) \frac{1}{{{\mathcal {P}}}(x;\varvec{\lambda })}\frac{\partial ^2 {{\mathcal {P}}}(x;\varvec{\lambda })}{\partial \lambda _i\partial \lambda _j}\biggr )\nonumber \\&\qquad - E\biggl ( w^2(x,y) \frac{1}{{{\mathcal {P}}}^2(x;\varvec{\lambda })}\frac{\partial {{\mathcal {P}}}(x;\varvec{\lambda })}{\partial \lambda _i} \frac{\partial {{\mathcal {P}}}(x;\varvec{\lambda })}{\partial \lambda _j}\biggr )\nonumber \\&\quad = \frac{\partial ^2}{\partial \lambda _i\partial \lambda _j} \int \frac{1}{\epsilon (x,y)} {{\mathcal {P}}}(x;\varvec{\lambda }) {{\mathcal {Q}}}(y;x){\text {d}}x{\text {d}}y /{{\mathcal {N}}}\nonumber \\&\qquad -\int \frac{1}{\epsilon (x,y)}\frac{1}{{{\mathcal {P}}}(x;\varvec{\lambda })} \frac{\partial {{\mathcal {P}}}(x;\varvec{\lambda })}{\partial \lambda _i}\nonumber \\&\qquad \times \frac{\partial {{\mathcal {P}}}(x;\varvec{\lambda })}{\partial \lambda _j} {{\mathcal {Q}}}(y;x){\text {d}}x{\text {d}}y /{{\mathcal {N}}} \end{aligned}$$
(32)

and

$$\begin{aligned}&E\biggl ( w^2(x,y)\frac{\partial \ln {{\mathcal {P}}}(x;\varvec{\lambda })}{\partial \lambda _i}\frac{\partial \ln {{\mathcal {P}}}(x;\varvec{\lambda })}{\partial \lambda _j} \biggr ) \nonumber \\&\quad = E\biggl ( w^2(x,y) \frac{1}{{{\mathcal {P}}}^2(x;\varvec{\lambda })}\frac{\partial {{\mathcal {P}}}(x;\varvec{\lambda })}{\partial \lambda _i}\frac{\partial {{\mathcal {P}}}(x;\varvec{\lambda })}{\partial \lambda _j}\biggr )\nonumber \\&\quad = \int \frac{1}{\epsilon (x,y)} \frac{1}{{{\mathcal {P}}}(x;\varvec{\lambda })} \frac{\partial {{\mathcal {P}}}(x;\varvec{\lambda })}{\partial \lambda _i}\frac{\partial {{\mathcal {P}}}(x;\varvec{\lambda })}{\partial \lambda _j}\nonumber \\&\qquad \times {{\mathcal {Q}}}(y;x){\text {d}}x{\text {d}}y /{{\mathcal {N}}}\nonumber \\&\quad = -E\biggl ( w^2(x,y)\frac{\partial ^2\ln {{\mathcal {P}}}(x;\varvec{\lambda })}{\partial \lambda _i\partial \lambda _j} \biggr )\nonumber \\&\qquad +\frac{\partial ^2}{\partial \lambda _i\partial \lambda _j} \int \frac{1}{\epsilon (x,y)} {{\mathcal {P}}}(x;\varvec{\lambda }) {{\mathcal {Q}}}(y;x){\text {d}}x{\text {d}}y /{{\mathcal {N}}}, \end{aligned}$$
(33)

where the term in the last line of Eq. 33, which corresponds to the expectation value in Eq. 26, is not generally zero, as the integral in the numerator can retain a dependence on \(\varvec{\lambda }\). For the example discussed in Sect. 4.1 this is explicitly calculated in Appendix A. This shows that parameter uncertainties determined using Eq. 20 are not generally asymptotically correct when performing weighted fits to account for acceptance corrections.

3.1.1 Weight uncertainties

If the weights to correct for an acceptance effect are only known to a certain precision, i.e. they are not fixed as assumed in Sect. 2.1, this induces an additional variance that is not included in Eq. 18 and that needs to be accounted for. This additional covariance can be estimated using standard error propagation, starting from Eq. 16. For weights depending on the \(N_T\) parameters \(p_m\) with covariance matrix \(\varvec{M}\), this results in

$$\begin{aligned}&{C}_{ij}^{\prime \prime } = \sum _{m,n=1}^{N_T} \frac{\partial ({\hat{\lambda }}_i-\lambda _{0i})}{\partial p_m} {M}_{mn} \frac{\partial ({\hat{\lambda }}_j-\lambda _{0j})}{\partial p_n}\quad \text {with} \end{aligned}$$
(34)
$$\begin{aligned}&\frac{\partial ({\hat{\lambda }}_i{-}\lambda _{0i})}{\partial p_m} {=} {-}\sum _{j=1}^{N_P} {H}_{ij}^{-1}\frac{\partial d_j}{\partial p_m}\biggr |_{\varvec{\lambda }_0} -\sum _{j=1}^{N_P}\frac{\partial {H}_{ij}^{-1}}{\partial p_m}d_j\biggr |_{\varvec{\lambda }_0}, \end{aligned}$$
(35)

where \(d_j\) and \({H}_{ij}\) are defined in Eq. 15. Due to the likelihood condition \(d_j\bigr |_{\hat{\varvec{\lambda }}}=0\), the second term in Eq. 35 behaves as \({{\mathcal {O}}}(1/\sqrt{N})\) and only the first term needs to be considered.

For the case of an efficiency histogram, where the efficiency is given in \(N_B\) bins with weight uncertainty \(\sigma _{m=1,\ldots ,N_B}\), weights inside a bin are fully correlated, but typically uncorrelated with other bins. In this case, the additional covariance matrix that needs to be added to account for the weight uncertainties is given by

$$\begin{aligned} {C}_{ij}^{\prime \prime }&= \sum _{k,l=1}^{N_P} {H}_{ik}^{-1} \sum _{m=1}^{N_B}\left( \left[ \sum _{e\in \mathrm{bin}\,m} \left. \frac{\partial \ln {{\mathcal {P}}}(x_e;\varvec{\lambda })}{\partial \lambda _k}\right| _{\hat{\varvec{\lambda }}}\right] \sigma _m^2 \nonumber \right. \\&\left. \quad \times \left[ \sum _{e\in \mathrm{bin}\,m} \left. \frac{\partial \ln {{\mathcal {P}}}(x_e;\varvec{\lambda })}{\partial \lambda _l}\right| _{\hat{\varvec{\lambda }}}\right] \right) {H}_{lj}^{-1}. \end{aligned}$$
(36)
Fig. 1
figure 1

Angular \(\cos \theta \) distribution of (black) data and (red) efficiency corrected events for \(10\,000\) pseudoexperiments consisting of 1000 events each

If the efficiency is modelled analytically, for example by a parameterisation that is fit to simulated samples, the impact of the uncertainty of the parameters \(\varvec{p}\) on the event weights \(w_e(\varvec{p})=1/\epsilon _e(\varvec{p})\) needs to be accounted for. For \(N_T\) parameters with covariance \(\varvec{M}\) the additional covariance matrix that needs to be added to Eq. 18 is given by

$$\begin{aligned} {C}_{ij}^{\prime \prime }&= \sum _{k,l=1}^{N_P}{H}_{ik}^{-1} \sum _{m,n=1}^{N_T}\biggl ( \sum _{e=1}^N \frac{\partial w_e}{\partial p_m} \left. \frac{\partial \ln {{\mathcal {P}}}(x_e;\varvec{\lambda })}{\partial \lambda _k}\right| _{\hat{\varvec{\lambda }}} \biggr ){M}_{mn}\nonumber \\&\quad \times \biggl ( \sum _{e=1}^N \frac{\partial w_e}{\partial p_n} \left. \frac{\partial \ln {{\mathcal {P}}}(x_e;\varvec{\lambda })}{\partial \lambda _l}\right| _{\hat{\varvec{\lambda }}} \biggr ) {H}_{lj}^{-1}. \end{aligned}$$
(37)

Identical results are obtained when using the systematic approach to error propagation that is employed for sWeights in the next section, which is based on combining the estimating equations for the parameters \(\varvec{p}\) and \(\varvec{\lambda }\) in a single vector.

3.2 The sPlot formalism

The sPlot formalism was introduced in Ref. [3] to statistically separate different event species in a data sample using per-event weights, the so-called sWeights, that are determined using a discriminating variable (in the following denoted by y). The sWeights allow to reconstruct the distribution of the different species in a control variable (in the following denoted by x), assuming the control and discriminating variables are statistically independent for each species. In this section, only a brief recap of the sPlot formalism is given, it is described in more detail in Ref. [3].

The sWeights are determined using an extended unbinned maximum likelihood fit of the discriminating variable y where the \(N_S\) different event species are well separated. An example of a discriminating variable (which will be discussed in more detail in Sect. 4.2) would be the reconstructed mass of a particle which is flat for the background components and peaks clearly for the signal component. For the typical use case of a signal component of interest and a single background component, the sWeight for event e is given by

$$\begin{aligned} w_{\mathrm {s}}(y_{e})&= \frac{\hat{V}_{ss}\mathcal{P}_{\mathrm {s}}(y_e)+\hat{V}_{sb}\mathcal{P}_{\mathrm {b}}(y_e)}{\hat{N}_{\mathrm {s}}\mathcal{P}_{\mathrm {s}}(y_e)+\hat{N}_{\mathrm {b}}\mathcal{P}_{\mathrm {b}}(y_e)}\nonumber \\&= \frac{\hat{V}_{bb}^{-1}\mathcal{P}_{\mathrm {s}}(y_e)-\hat{V}_{sb}^{-1}\mathcal{P}_{\mathrm {b}}(y_e)}{(\hat{V}_{bb}^{-1}-\hat{V}_{sb}^{-1})\mathcal{P}_{\mathrm {s}}(y_e)+(\hat{V}_{ss}^{-1}-\hat{V}_{sb}^{-1})\mathcal{P}_{\mathrm {b}}(y_e)}, \end{aligned}$$
(38)

where \(\hat{N}_{\mathrm {s}}=\hat{V}_{ss}+\hat{V}_{sb}\) and \(\hat{N}_{\mathrm {b}}=\hat{V}_{bb}+\hat{V}_{sb}\) is used [3]. Retaining only the dependency on the inverse covariance matrix elements \(V_{ij}^{-1}\) simplifies the following derivations. The estimates for the inverse covariance matrix elements are given by

$$\begin{aligned} \hat{V}_{ij}^{-1}&= \sum _{e=1}^N \frac{\mathcal{P}_{i}(y_e)\mathcal{P}_{j}(y_e)}{\bigl (\hat{N}_{\mathrm {s}}\mathcal{P}_{\mathrm {s}}(y_e)+\hat{N}_{\mathrm {b}}\mathcal{P}_{\mathrm {b}}(y_e)\bigr )^2}, \quad \text {with expectation}\nonumber \\ V_{ij}^{-1}&= \int \frac{\mathcal{P}_{i}(y)\mathcal{P}_{j}(y)}{N_{\mathrm {s}}\mathcal{P}_{\mathrm {s}}(y)+N_{\mathrm {b}}\mathcal{P}_{\mathrm {b}}(y)}\mathrm {d}y. \end{aligned}$$
(39)

Using the sWeights in a weighted unbinned maximum likelihood fit allows to statistically subtract events originating from species not of interest [14], by including them as event weights in Eq. 3, resulting in the estimating equations

$$\begin{aligned} \sum _{e=1}^N w_{\mathrm {s}}(y_e;\hat{V}_{ss}^{-1},\hat{V}_{sb}^{-1},\hat{V}_{bb}^{-1}) \frac{\partial \ln {\mathcal{P}}(x_e;\varvec{\lambda })}{\partial \lambda _i}\biggr |_{\hat{\varvec{\lambda }}} = 0. \end{aligned}$$
(40)

The sWeights depend on estimates for the inverse covariance matrix elements \(V_{ij}^{-1}\) (Eq. 38), which in turn depend on estimates for the signal and background yields (Eq. 39) determined from the same sample.Footnote 5 To account for this effect, i.e. in order to systematically perform error propagation, it is useful to combine the estimating equations for the yields \(N_{\mathrm {s}}\) and \(N_{\mathrm {b}}\), the inverse covariance matrix elements \(V_{ij}^{-1}\), and the parameters of interest \(\varvec{\lambda }\) in a single vector

Fig. 2
figure 2

Pull distributions from 10 000 pseudoexperiments for the different approaches to the uncertainty estimation for the efficiency correction \(\epsilon (\cos \theta )=1.0-0.7\cos ^2\theta \) at a total yield of 2000 events for each pseudoexperiment

Fig. 3
figure 3

Pull distributions from 10 000 pseudoexperiments for the different approaches to the uncertainty estimation for the efficiency correction \(\epsilon (\cos \theta )=0.3+0.7\cos ^2\theta \) at a total yield of 2000 events for each pseudoexperiment

$$\begin{aligned} {\varvec{g}}(\varvec{x},\varvec{y};\varvec{\theta })&= \left( \begin{array}{c} \varphi _s(\varvec{y};N_{\mathrm {s}},N_{\mathrm {b}})\\ \varphi _b(\varvec{y};N_{\mathrm {s}},N_{\mathrm {b}})\\ \psi _{ss}(\varvec{y};V_{ss}^{-1},N_{\mathrm {s}},N_{\mathrm {b}})\\ \psi _{sb}(\varvec{y};V_{sb}^{-1},N_{\mathrm {s}},N_{\mathrm {b}})\\ \psi _{bb}(\varvec{y};V_{bb}^{-1},N_{\mathrm {s}},N_{\mathrm {b}})\\ \xi _i(\varvec{x},\varvec{y};\varvec{\lambda },V_{ss}^{-1},V_{sb}^{-1},V_{bb}^{-1}) \end{array} \right) \nonumber \\&= \left( \begin{array}{c} \sum _e\frac{\partial }{\partial N_{\mathrm {s}}}\bigl [\ln (N_{\mathrm {s}}\mathcal{P}_{\mathrm {s}}(y_e)+N_{\mathrm {b}}\mathcal{P}_{\mathrm {b}}(y_e)) - \frac{N_{\mathrm {s}}+N_{\mathrm {b}}}{N} \bigr ]\\ \sum _e\frac{\partial }{\partial N_{\mathrm {b}}}\bigl [\ln (N_{\mathrm {s}}\mathcal{P}_{\mathrm {s}}(y_e)+N_{\mathrm {b}}\mathcal{P}_{\mathrm {b}}(y_e)) - \frac{N_{\mathrm {s}}+N_{\mathrm {b}}}{N} \bigr ]\\ \sum _e \bigl [\frac{\mathcal{P}_{\mathrm {s}}(y_e)\mathcal{P}_{\mathrm {s}}(y_e)}{(N_{\mathrm {s}}\mathcal{P}_{\mathrm {s}}(y_e)+N_{\mathrm {b}}\mathcal{P}_{\mathrm {b}}(y_e))^2} - \frac{V_{ss}^{-1}}{N}\bigr ]\\ \sum _e \bigl [\frac{\mathcal{P}_{\mathrm {s}}(y_e)\mathcal{P}_{\mathrm {b}}(y_e)}{(N_{\mathrm {s}}\mathcal{P}_{\mathrm {s}}(y_e)+N_{\mathrm {b}}\mathcal{P}_{\mathrm {b}}(y_e))^2} - \frac{V_{sb}^{-1}}{N}\bigr ]\\ \sum _e \bigl [\frac{\mathcal{P}_{\mathrm {b}}(y_e)\mathcal{P}_{\mathrm {b}}(y_e)}{(N_{\mathrm {s}}\mathcal{P}_{\mathrm {s}}(y_e)+N_{\mathrm {b}}\mathcal{P}_{\mathrm {b}}(y_e))^2} - \frac{V_{bb}^{-1}}{N}\bigr ]\\ \sum _e w_{\mathrm {s}}(y_e;V_{ss}^{-1},V_{sb}^{-1},V_{bb}^{-1}) \frac{\partial \ln {\mathcal {P}}(x_e;\varvec{\lambda })}{\partial \lambda _i} \end{array} \right) , \end{aligned}$$
(41)

where \(\varvec{\theta }\) denotes the vector of parameters \(\varvec{\theta }=\{N_{\mathrm {s}},N_{\mathrm {b}},V_{ss}^{-1},V_{sb}^{-1},V_{bb}^{-1},\varvec{\lambda }\}\). It should be noted that solving \(\varvec{g}(\varvec{x},\varvec{y};\varvec{\theta })|_{\hat{\varvec{\theta }}}=0\) is equivalent to solving the estimating equations for the yields, the inverse covariance matrix elements, and the parameters of interest \(\varvec{\lambda }\) sequentially. It can be shown that \(E\bigl (\varvec{g}(\varvec{x},\varvec{y};\varvec{\theta })|_{\varvec{\theta }_0}\bigr )=0\), i.e. the estimating equations are unbiased,Footnote 6 as

$$\begin{aligned}&E\biggl (\varphi _i(\varvec{y};N_{\mathrm {s}},N_{\mathrm {b}})\bigr |_{\varvec{\theta }_0}\biggr ) = \int \frac{\mathcal{P}_{i}(N_{\mathrm {s}}\mathcal{P}_{\mathrm {s}}+N_{\mathrm {b}}\mathcal{P}_{\mathrm {b}})}{N_{\mathrm {s}}\mathcal{P}_{\mathrm {s}}+N_{\mathrm {b}}\mathcal{P}_{\mathrm {b}}}\mathrm {d}y - 1 = 0 \end{aligned}$$
(42)
$$\begin{aligned}&E\biggl (\psi _{ij}(\varvec{y};V_{ij}^{-1},N_{\mathrm {s}},N_{\mathrm {b}})\bigr |_{\varvec{\theta }_0}\biggr ) {=} \int \frac{\mathcal{P}_{i}\mathcal{P}_{j}}{N_{\mathrm {s}}\mathcal{P}_{\mathrm {s}}{+}N_{\mathrm {b}}\mathcal{P}_{\mathrm {b}}}\mathrm {d}y {-} E(V_{ij}^{-1}) {=} 0, \end{aligned}$$
(43)

and

$$\begin{aligned}&E\biggl (\xi _i(\varvec{x},\varvec{y};\varvec{\lambda },V_{ss}^{-1},V_{sb}^{-1},V_{bb}^{-1})\bigr |_{\varvec{\theta }_0}\biggr )\nonumber \\&\quad = E\biggl (\sum _ew_{\mathrm {s}}(y_e)\frac{\partial \ln {\mathcal {P}_{s}}(x_e;\varvec{\lambda })}{\partial \lambda _i}\biggr )\nonumber \\&\quad = \int \frac{V_{bb}^{-1}\mathcal{P}_{\mathrm {s}}(y)-V_{sb}^{-1}\mathcal{P}_{\mathrm {b}}(y)}{(V_{bb}^{-1}-V_{sb}^{-1})\mathcal{P}_{\mathrm {s}}(y)+(V_{ss}^{-1}-V_{sb}^{-1})\mathcal{P}_{\mathrm {b}}(y)} \frac{\partial \ln \mathcal{P}_{\mathrm {s}}(x;\varvec{\lambda })}{\partial \lambda _i}\nonumber \\&\qquad \times \bigl (N_{\mathrm {s}}\mathcal{P}_{\mathrm {s}}(x)\mathcal{P}_{\mathrm {s}}(y)+N_{\mathrm {b}}\mathcal{P}_{\mathrm {b}}(x)\mathcal{P}_{\mathrm {b}}(y)\bigr )\mathrm {d}x\mathrm {d}y\nonumber \\&\quad = \int \frac{V_{bb}^{-1}\mathcal{P}_{\mathrm {s}}(y)-V_{sb}^{-1}\mathcal{P}_{\mathrm {b}}(y)}{(V_{bb}^{-1}-V_{sb}^{-1})\mathcal{P}_{\mathrm {s}}(y)+(V_{ss}^{-1}-V_{sb}^{-1})\mathcal{P}_{\mathrm {b}}(y)}\mathcal{P}_{\mathrm {b}}(y)\mathrm {d}y\nonumber \\&\qquad \times \underbrace{\int N_{\mathrm {b}}\frac{\mathcal{P}_{\mathrm {b}}(x)}{\mathcal{P}_{\mathrm {s}}(x;\varvec{\lambda })}\frac{\partial \mathcal{P}_{\mathrm {s}}(x;\varvec{\lambda })}{\partial \lambda _i}\mathrm {d}x}_{\kappa _i}\nonumber \\&\quad = \frac{V_{bb}^{-1}V_{sb}^{-1}-V_{sb}^{-1}V_{bb}^{-1}}{V_{ss}^{-1}V_{bb}^{-1}-V_{sb}^{-1}V_{sb}^{-1}} \times \kappa _i=0. \end{aligned}$$
(44)

The covariance matrix for the full system in the asymptotic limit is given byFootnote 7 [6, 18]

$$\begin{aligned} {\varvec{C}}_{\varvec{\theta }}&= E\biggl (\frac{\partial \varvec{g}(\varvec{x},\varvec{y};\varvec{\theta })}{\partial \varvec{\theta }^T}\biggr )^{-1} \times E\bigl (\varvec{g}(\varvec{x},\varvec{y};\varvec{\theta })\varvec{g}(\varvec{x},\varvec{y};\varvec{\theta })^T\bigr )\nonumber \\&\quad \times E\biggl (\frac{\partial \varvec{g}(\varvec{x},\varvec{y};\varvec{\theta })}{\partial \varvec{\theta }^T}\biggr )^{-T}. \end{aligned}$$
(45)

The covariance can be estimated from the sample by replacing the expectation values \(E\bigl (\partial g_i(\varvec{x},\varvec{y};\varvec{\theta })/\partial \theta _j\bigr )\) and \(E\bigl (g_i(\varvec{x},\varvec{y};\varvec{\theta })g_j(\varvec{x},\varvec{y};\varvec{\theta })\bigr )\) by their sample estimates, which are given in Appendices B.1 and B.2, respectively.

For the case of classic sWeights, where the shapes of all probability density functions are known, the above expression can be simplified further. The detailed calculation is given in Appendix B. For the covariance of the parameters of interest \(\varvec{\lambda }\) it results in

$$\begin{aligned} \varvec{C}_{\varvec{\lambda }}&= \varvec{H}^{-1}\varvec{H}^\prime \varvec{H}^{-T} - \varvec{H}\varvec{E}\varvec{C}^\prime \varvec{E}^T\varvec{H}^{-T} \quad \text {with}\nonumber \\ {H}_{ij}&= \sum _e w_{\mathrm {s}}(y_e)\frac{\partial ^2\ln \mathcal{P}_{\mathrm {s}}(x_e;\varvec{\lambda })}{\partial \lambda _i\partial \lambda _j} \nonumber \\ {H}_{ij}^\prime&= \sum _e w_{\mathrm {s}}^2(y_e)\frac{\partial \ln \mathcal{P}_{\mathrm {s}}(x_e;\varvec{\lambda })}{\partial \lambda _i}\frac{\partial \ln \mathcal{P}_{\mathrm {s}}(x_e;\varvec{\lambda })}{\partial \lambda _j} \nonumber \\ {E}_{k(ij)}&= \sum _e \frac{\partial w_{\mathrm {s}}(y_e)}{\partial V_{ij}^{-1}}\frac{\partial \ln \mathcal{P}_{\mathrm {s}}(x_e;\varvec{\lambda })}{\partial \lambda _k} \nonumber \\ {C}_{(ij)(kl)}^\prime&= \sum _e \frac{\mathcal{P}_{i}(y_e)\mathcal{P}_{j}(y_e)\mathcal{P}_{k}(y_e)\mathcal{P}_{l}(y_e)}{\bigl (N_{\mathrm {s}}\mathcal{P}_{\mathrm {s}}(y_e)+N_{\mathrm {b}}\mathcal{P}_{\mathrm {b}}(y_e)\bigr )^4}. \end{aligned}$$
(46)

Note that using only the first term in Eq. 46 (which corresponds to Eq. 18) would not generally be asymptotically correct but instead conservative for the parameter variances, as the matrix \(\varvec{C}^\prime \) is positive definite.

The same technique used for the calculation of Eq. 46 can also be used to determine the (co)variance of the sum of sWeights in non-overlapping bins in the control variable x, which is needed to perform \(\chi ^2\) fits of binned sWeighted data. The detailed calculation for the covariance of \(\varvec{S}\) (with \(S_i=\sum _{e\,\in \,\text {bin}\,i}w_{\mathrm {s}}(y_e)\) for bin i) is given in Appendix C and results in

$$\begin{aligned}&\varvec{C}_{\varvec{S}} = \varvec{H}^\prime -\varvec{E}\varvec{C}^\prime \varvec{E}^T \quad \text {where}\nonumber \\&{H}^\prime _{ij} = \delta _{ij}\sum _{e\,\in \,\text {bin}\,i} w_{\mathrm {s}}^2(y_e)\nonumber \\&E_{k(ij)} = \sum _{e\,\in \,\text {bin}\,k} \frac{\partial w_{\mathrm {s}}(y_e)}{\partial V_{ij}^{-1}}\nonumber \\&C_{(ij)(kl)}^\prime = \sum _e \frac{\mathcal{P}_{i}(y_e)\mathcal{P}_{j}(y_e)\mathcal{P}_{k}(y_e)\mathcal{P}_{l}(y_e)}{\bigl (N_{\mathrm {s}}\mathcal{P}_{\mathrm {s}}(y_e)+N_{\mathrm {b}}\mathcal{P}_{\mathrm {b}}(y_e)\bigr )^4}. \end{aligned}$$
(47)

As is apparent, using only the first term in Eq. 47, i.e. using \(\sum _{e\,\in \,{\mathrm {bin}}\,i}w_{\mathrm {s}}^2(y_e)\) as estimate for the variance of the content of bin i, is not generally asymptotically correct but conservative, as \(\varvec{C}^\prime \) is positive definite. The second term in Eq. 47 also induces correlations between bins which should be accounted for in a binned \(\chi ^2\) fit.

Fig. 4
figure 4

(Left) Pull means and (right) pull widths for the efficiency correction \(\epsilon (\cos \theta )=1.0-0.7\cos ^2\theta \), depending on total event yield \(N_{\mathrm{tot}}\). The markers are slightly horizontally staggered to improve readability

Fig. 5
figure 5

(Left) Pull means and (right) pull widths for the efficiency correction \(\epsilon (\cos \theta )=0.3+0.7\cos ^2\theta \), depending on total event yield \(N_{\mathrm{tot}}\). The markers are slightly horizontally staggered to improve readability

3.2.1 sWeights and nuisance parameters

Additional nuisance parameters \(\varvec{\alpha }\) present in the extended maximum likelihood fit of the event yields (e.g. shape parameters of \(\mathcal{P}_{\mathrm {s}}\) or \(\mathcal{P}_{\mathrm {b}}\)) can be easily included in the formalism used in the previous section. The estimating equations \(\varphi _i(\varvec{y};\varvec{\alpha },N_{\mathrm {s}},N_{\mathrm {b}})\) for the parameters \(\varvec{\alpha }\) need to be added to the vector \(\varvec{g}\) defined in Eq. 41, resulting in a modified

$$\begin{aligned}&{\varvec{g}^\prime }(\varvec{x},\varvec{y};\varvec{\theta }^\prime ) = \left( \begin{array}{c} \varphi _s(\varvec{y};\varvec{\alpha },N_{\mathrm {s}},N_{\mathrm {b}})\\ \varphi _b(\varvec{y};\varvec{\alpha },N_{\mathrm {s}},N_{\mathrm {b}})\\ \varphi _i(\varvec{y};\varvec{\alpha },N_{\mathrm {s}},N_{\mathrm {b}})\\ \psi _{ss}(\varvec{y};V_{ss}^{-1},\varvec{\alpha },N_{\mathrm {s}},N_{\mathrm {b}})\\ \psi _{sb}(\varvec{y};V_{sb}^{-1},\varvec{\alpha },N_{\mathrm {s}},N_{\mathrm {b}})\\ \psi _{bb}(\varvec{y};V_{bb}^{-1},\varvec{\alpha },N_{\mathrm {s}},N_{\mathrm {b}})\\ \xi _i(\varvec{x},\varvec{y};\varvec{\lambda },V_{ss}^{-1},V_{sb}^{-1},V_{bb}^{-1},\varvec{\alpha }) \end{array} \right) \nonumber \\&\quad = \left( \begin{array}{c} \sum _e\frac{\partial }{\partial N_{\mathrm {s}}}\bigl [\ln (N_{\mathrm {s}}\mathcal{P}_{\mathrm {s}}(y_e;\varvec{\alpha })+N_{\mathrm {b}}\mathcal{P}_{\mathrm {b}}(y_e;\varvec{\alpha })) - \frac{N_{\mathrm {s}}+N_{\mathrm {b}}}{N} \bigr ]\\ \sum _e\frac{\partial }{\partial N_{\mathrm {b}}}\bigl [\ln (N_{\mathrm {s}}\mathcal{P}_{\mathrm {s}}(y_e;\varvec{\alpha })+N_{\mathrm {b}}\mathcal{P}_{\mathrm {b}}(y_e;\varvec{\alpha })) - \frac{N_{\mathrm {s}}+N_{\mathrm {b}}}{N} \bigr ]\\ \sum _e\frac{\partial }{\partial \alpha _i}\bigl [\ln (N_{\mathrm {s}}\mathcal{P}_{\mathrm {s}}(y_e;\varvec{\alpha })+N_{\mathrm {b}}\mathcal{P}_{\mathrm {b}}(y_e;\varvec{\alpha })) - \frac{N_{\mathrm {s}}+N_{\mathrm {b}}}{N} \bigr ]\\ \sum _e \bigl [\frac{\mathcal{P}_{\mathrm {s}}(y_e;\varvec{\alpha })\mathcal{P}_{\mathrm {s}}(y_e;\varvec{\alpha })}{(N_{\mathrm {s}}\mathcal{P}_{\mathrm {s}}(y_e;\varvec{\alpha })+N_{\mathrm {b}}\mathcal{P}_{\mathrm {b}}(y_e;\varvec{\alpha }))^2} - \frac{V_{ss}^{-1}}{N}\bigr ]\\ \sum _e \bigl [\frac{\mathcal{P}_{\mathrm {s}}(y_e;\varvec{\alpha })\mathcal{P}_{\mathrm {b}}(y_e;\varvec{\alpha })}{(N_{\mathrm {s}}\mathcal{P}_{\mathrm {s}}(y_e;\varvec{\alpha })+N_{\mathrm {b}}\mathcal{P}_{\mathrm {b}}(y_e;\varvec{\alpha }))^2} - \frac{V_{sb}^{-1}}{N}\bigr ]\\ \sum _e \bigl [\frac{\mathcal{P}_{\mathrm {b}}(y_e;\varvec{\alpha })\mathcal{P}_{\mathrm {b}}(y_e;\varvec{\alpha })}{(N_{\mathrm {s}}\mathcal{P}_{\mathrm {s}}(y_e;\varvec{\alpha })+N_{\mathrm {b}}\mathcal{P}_{\mathrm {b}}(y_e;\varvec{\alpha }))^2} - \frac{V_{bb}^{-1}}{N}\bigr ]\\ \sum _e w_{\mathrm {s}}(y_e;V_{ss}^{-1},V_{sb}^{-1},V_{bb}^{-1},\varvec{\alpha }) \frac{\partial \ln {\mathcal {P}}(x_e;\varvec{\lambda })}{\partial \lambda _i} \end{array} \right) , \end{aligned}$$
(48)

where the vector of parameters is \(\varvec{\theta }^\prime =\{N_{\mathrm {s}},N_{\mathrm {b}},\varvec{\alpha },V_{ss}^{-1},V_{sb}^{-1},V_{bb}^{-1},\varvec{\lambda }\}\). The covariance matrix in the asymptotic limit is then given by

$$\begin{aligned} {\varvec{C}}_{\varvec{\theta }^\prime }&= E\biggl (\frac{\partial \varvec{g}^\prime (\varvec{x},\varvec{y};\varvec{\theta }^\prime )}{\partial \varvec{\theta }^{\prime T}}\biggr )^{-1} \times E\bigl (\varvec{g}^\prime (\varvec{x},\varvec{y};\varvec{\theta }^\prime )\varvec{g}^\prime (\varvec{x},\varvec{y};\varvec{\theta }^\prime )^T\bigr )\nonumber \\&\quad \times E\biggl (\frac{\partial \varvec{g}^\prime (\varvec{x},\varvec{y};\varvec{\theta }^\prime )}{\partial \varvec{\theta }^{\prime T}}\biggr )^{-T}. \end{aligned}$$
(49)

The covariance can again be estimated from the sample by replacing the expectation values \(E\bigl (\partial g_i^\prime (\varvec{x},\varvec{y};\varvec{\theta }^\prime )/\partial \theta _j^\prime \bigr )\) and \(E\bigl (g_i^\prime (\varvec{x},\varvec{y};\varvec{\theta }^\prime )g_j^\prime (\varvec{x},\varvec{y};\varvec{\theta }^\prime )\bigr )\) by their sample estimates. It should be noted that the nuisance parameters \(\varvec{\alpha }\) in this case will induce additional covariance terms beyond Eq. 46.Footnote 8

4 Examples

4.1 Correcting for an acceptance effect with event weights

The first example discussed in this paper is the fit of an angular distribution to determine angular coefficients, using event weights to correct for an acceptance effect. The probability density function used to generate and fit the pseudoexperiments is a simple second order polynomial in the angle \(\cos \theta \):

$$\begin{aligned} {{\mathcal {P}}}(\cos \theta ;c_0,c_1)&= \left( 1+c_0\cos \theta +c_1\cos ^2\theta \right) /{{\mathcal {N}}}\quad \text {with}\nonumber \\ {{\mathcal {N}}}&= \int _{-1}^{+1} \left( 1+c_0\cos \theta +c_1\cos ^2\theta \right) \mathrm{dcos}\theta \nonumber \\&=2+\tfrac{2}{3}c_1. \end{aligned}$$
(50)

In the generation, the values \(c_0^{\mathrm{gen}}=0\) and \(c_1^{\mathrm{gen}}=0\) are used. Events are generated using a \(\cos \theta \)-dependent efficiency \(\epsilon (\cos \theta )\). Two efficiencies shapes are studied, given by

  1. (a)

    \(\epsilon (\cos \theta ) = 1.0-0.7\cos ^2\theta \) and

  2. (b)

    \(\epsilon (\cos \theta ) = 0.3+0.7\cos ^2\theta \).

For simplicity, no uncertainty is assumed on the description of the acceptance effect by \(\epsilon (\cos \theta )\), otherwise the effect of uncertainties on event weights would need to be included as described in Sect. 3.1. Figure 1 shows the generated data (including the acceptance effect) in black and the efficiency corrected distributions, weighted by \(w_e=1/\epsilon (\cos \theta _e)\) in red.

Fig. 6
figure 6

Discriminating mass distribution for (black) the full data, (blue) signal and (red) background

The parameters \(c_0\) and \(c_1\) are determined using a weighted unbinned maximum likelihood fit, solving Eq. 3. The uncertainties on the parameters \(c_0\) and \(c_1\) are determined using the approaches to determine parameter uncertainties that are discussed in Sect. 2. The following methods are studied:

  1. (a)

    The method of using the uncertainties determined according to Eq. 4 without any correction, denoted as wFit in this section.

  2. (b)

    Scaling the weights according to Eq. 19. This approach is referred to as scaled weights.

  3. (c)

    Determining the covariance matrix using Eq. 20. This method is referred to as squared correction in the following.

  4. (d)

    Bootstrapping the data (using 1000 bootstraps) with replacement, denoted as bootstrapping.

  5. (e)

    The method to determine the covariance according to Eq. 18 as discussed in Sect. 2.1, referred to as asymptotic method.

  6. (f)

    A conventional fit (cFit) modelling the efficiency correction effect in the probability density function (and its normalisation) instead of using event weights.

The performance of the methods is compared using pseudoexperiments, with each study consisting of 10 000 toy data samples. The same data samples are used for every method. The distribution of the pull, defined as \(p_i(c_0)=(c_{0,i}-c_{0}^{\mathrm{gen}})/\sigma _i(c_0)\) (and analogously for parameter \(c_1\)), is used to test the different methods for uncertainty estimation. Here, the fitted value for parameter \(c_0\) in pseudoexperiment i is denoted as \(c_{0,i}\), the corresponding uncertainty is denoted as \(\sigma _{i}(c_0)\) and the generated value as \(c_0^{\mathrm{gen}}\). If the fit is unbiased and the uncertainties are determined correctly, the pull distribution is expected to be a Gaussian distribution with a mean compatible with zero and a width compatible with one. Different event yields per data sample (\(N=500\), 1000, 2000, 5000, \(10\,000\), \(20\,000\), \(50\,000\)) are studied to investigate the influence of statistics.

Table 1 (Left) The parameters used in the generation of the pseudoexperiments. Only \(N_{\mathrm{sig}}\), \(N_{\mathrm{bkg}}\), and the background slope \(\alpha _{\mathrm{bkg}}\) are varied in the mass fit. The background slope \(\alpha _{\mathrm{bkg}}\) is then fixed for the determination of the sWeights. (Right) the mean correlation matrix from the mass fit when both the yields and \(\alpha _{\mathrm{bkg}}\) are allowed to float
Fig. 7
figure 7

Decay time distributions for the four different background options for (black) the full data, (blue) signal and (red) background. The signal and background components are obtained using sWeights

The pull distributions for the parameters \(c_0\) and \(c_1\) for 2000 events are shown in Figs. 2 and 3. The pull means and widths depending on statistics are given in Figs. 4 and 5. Numerical values are given in Tables 2 and 3 in Appendix D. A few remarks are in order.

Fig. 8
figure 8

Pull distributions from 10 000 pseudoexperiments for the different approaches to the uncertainty estimation for a total yield of 2000 events in each pseudoexperiment. The different figures shown correspond to the different background models as specified in Sect. 4

Fig. 9
figure 9

Pull distributions from 10 000 pseudoexperiments for the different approaches to the uncertainty estimation for a total yield of 2000 events in each pseudoexperiment. The different figures shown correspond to the different background models as specified in Sect. 4

The wFit method is unbiased but shows significant undercoverage for \(c_0\) and \(c_1\) for both acceptance corrections tested. The scaled weights approach shows significant undercoverage for both \(c_0\) and \(c_1\) for the acceptance (a) and overcoverage for acceptance (b). In both cases, the coverage remains incorrect even for high statistics. Both the use of the wFit as well as the scaled weights methods are therefore strongly disfavoured to determine the parameter uncertainties in this example for a simple efficiency correction. The squared method shows good behaviour for parameter \(c_0\) but incorrect coverage for parameter \(c_1\). For parameter \(c_1\) the method shows overcoverage for acceptance (a) and very significant undercoverage (more severe than even the wFit) for acceptance (b). The reason for this behaviour is the different expectation value of Eq. 26 with respect to the second derivatives to \(c_0\) and \(c_1\), as detailed in Appendix A. This illustrates that the squared correction method, which is widely used in particle physics, in general does not provide asymptotically correct confidence intervals when using event weights to correct for acceptance effects. Bootstrapping the data sample or using the asymptotic approach results in pull distributions with correct coverage for both \(c_0\) and \(c_1\) and both acceptance effects. No bias is observed for parameter \(c_0\) and only a small bias is found for \(c_1\) at low statistics. This paper therefore advocates for the use of the asymptotic method (or alternatively bootstrapping) when using event weights to account for acceptance corrections. The pull distributions for the cFit also show, as expected, good behaviour. As there is no loss of information for the cFit, it can result in better sensitivity, as shown by the relative efficiencies given in Tables 2 and 3, and its use should be strongly considered, where feasible.

4.2 Background subtraction using sWeights

As second specific example for the determination of confidence intervals in the presence of event weights, the determination of the lifetime \(\tau \) of an exponential decay in the presence of background is discussed. The sPlot method [3] is used to statistically subtract the background component. As discriminating variable, the reconstructed mass is used. In this example, the signal is distributed according to a Gaussian in the reconstructed mass, and the background is described by a single Exponential function with slope \(\alpha _{\mathrm{bkg}}\). Figure 6 shows the mass distribution for signal and background components. The parameters used in the generation of the pseudoexperiments are listed in Table 1a. The configuration is purposefully chosen such that there is a significant correlation between the yields and the slope of the background exponential, to illustrate the effect of fixing nuisance parameters in the sPlot formalism, as discussed in Sect. 3.2. The resulting mean correlation matrix for the mass fit is shown in Table 1b. The simpler case, where no significant correlation between \(\alpha _{\mathrm{bkg}}\) and the event yields is present, due to a different choice of mass range, is discussed in Appendix F.

The decay time distribution (the control variable) that is used to determine the lifetime is a single Exponential for the signal. For the background component, several different shapes were tested: (a) A single Exponential with long lifetime, (b) a Gaussian distribution, (c) a triangular distribution, and (d) a flat distribution in the decay time. Figure 7 shows the decay time distribution for the different options. The decay time distributions for signal and background components shown are obtained using the sPlot formalism [3] described in Sect. 3.2.

The parameter \(\tau \) is determined using a weighted unbinned maximum likelihood fit solving the maximum likelihood condition Eq. 3 numerically. Its uncertainty \(\sigma (\tau )\) is determined using the different methods for weighted unbinned maximum likelihood fits discussed in Sect. 2. The following approaches are studied:

  1. (a)

    A weighted fit determining the uncertainties according to Eq. 4 without any correction. This method is denoted as sFit in the following.

  2. (b)

    Scaling the weights according to Eq. 19. The approach is denoted as scaled weights.

  3. (c)

    Determining the covariance matrix using Eq. 20. This method is referred to as squared correction.

  4. (d)

    Bootstrapping the data (using 1000 bootstraps) with replacement, without rederiving the sWeights (i.e. keeping the original sWeights for each event). Denoted as bootstrapping in the following.

  5. (e)

    Bootstrapping the data (again using 1000 bootstraps) and rederiving the sWeights for every bootstrapped sample, in the following denoted as full bootstrapping.

  6. (f)

    The asymptotic method to determine the covariance according to Eq. 46 as discussed in Sect. 3.2, but not accounting for the impact of nuisance parameters in the determination of the sWeights. This approach is referred to as asymptotic method.

  7. (g)

    The method to determine the covariance according to Eq. 49, which includes the effect of nuisance parameters in the sWeight determination. This method is denoted as full asymptotic.

  8. (h)

    A conventional fit (cFit) modelling both signal and background components in two dimensions (mass and decay time) for comparison. As the main point of using sWeights is to remove the need to model the background contribution in the fit, this method is given purely for comparison.

The performance of the different methods is evaluated using pseudoexperiments. Every study consists of 10 000 data samples generated and then fit for an initial determination of the sWeights. For every method, the same data samples are used.

Fig. 10
figure 10

(Left) Pull means and (right) pull widths depending on total event yield \(N_{\mathrm{tot}}\). The markers are slightly horizontally staggered to improve readability

Fig. 11
figure 11

(Left) Pull means and (right) pull widths depending on total event yield \(N_{\mathrm{tot}}\). The markers are slightly horizontally staggered to improve readability

The performance of the different methods is compared using the distribution of the pull, defined as \(p_i(\tau )=(\tau _i-\tau ^{\mathrm{gen}}_{\mathrm{sig}})/\sigma _i(\tau )\). Here, \(\tau _i\) is the central value determined by the weighted maximum likelihood fit and \(\sigma _i(\tau )\) the uncertainty determined by the above methods. The lifetime used in the generation is denoted as \(\tau ^{\mathrm{gen}}_{\mathrm{sig}}\). To study the influence of statistics, pseudoexperiments are performed for different numbers of events. The total yields \(N_{\mathrm{tot}}=N_{\mathrm{sig}}+N_{\mathrm{bkg}}\) generated correspond to 400, 1000, 2000, 4000, 10 000 and 20 000 events. The signal fraction used in the generation is \(f_{\mathrm{sig}}=N_{\mathrm{sig}}/(N_{\mathrm{sig}}+N_{\mathrm{bkg}})=0.5\).

The pull distributions from 10 000 pseudoexperiments, each with a total yield of 2000 events, are shown in Figs. 8 and 9. The pull means and widths are shown in Figs. 10 and 11. Numerical values for the different configurations are given in Tables 4 and 5 in Appendix E.

As expected, both the sFit as well as the approach using scaled weights perform quite poorly, as they show large undercoverage, both for low statistics as well as for high statistics. Furthermore, they exhibit significant bias at low statistics (which reduces at large statistics) due to a strong correlation of the uncertainty with the parameter \(\tau \). This strongly disfavours the use of these methods for these sWeighted examples. The squared correction method shows better performance, nevertheless also exhibits significant bias (which reduces for higher statistics) and undercoverage. It should be stressed that significant undercoverage is still present at large statistics. This shows that the squared correction method in general does not provide asymptotically correct confidence intervals. Both bootstrapping as well as the asymptotic methods perform better for the examples studied here. However, both methods show some remaining undercoverage even at high statistics. It is instructive that bootstrapping the data without redetermining the sWeights performs identically to the asymptotic method without accounting for the uncertainty due to nuisance parameters. However, when performing a full bootstrapping including rederiving the sWeights for the bootstrapped samples or when using the full asymptotic method the confidence intervals generally cover correctly and no significant biases are observed. Only at low statistics some slight overcoverage can be observed. This paper therefore advocates the use of the full asymptotic method, or alternatively, if computationally possible, the full bootstrapping approach for the determination of uncertainties in unbinned maximum likelihood fits using sWeights. If nuisance parameters have no large impact on the sWeights, the asymptotic method can also be appropriate, as shown in Appendix F.

The conventional fit describing the background component in the decay time explicitly instead of using sWeights also shows good behaviour, as expected. When the background distribution is known, a conventional fit is generally advantageous as it has improved sensitivity due to the additional available information. For this example, where the background pollution and parameter correlations are large, the parameter sensitivity is significantly improved when using the conventional (unweighted) fit, as shown by the relative efficiencies given in Tables 4 and 5.

5 Conclusions

This paper derives the asymptotically correct method to determine parameter uncertainties in the presence of event weights for acceptance corrections, which was previously discussed in Ref. [4] but does not currently see widespread use in high energy particle physics. The performance of this approach is validated on pseudoexperiments and compared with several commonly used methods. The asymptotically correct approach performs well, while several of the commonly used methods are shown to not generally result in correct coverage, even for large statistics. In addition, the effect of weight uncertainties for acceptance corrections is discussed. The paper furthermore derives asymptotically correct expressions for parameter uncertainties in fits that use event weights to statistically subtract background events using the sPlot formalism [3]. The asymptotically correct expression accounting for the presence of nuisance parameters in the determination of sWeights is also given. On pseudoexperiments the asymptotically correct methods perform well, whereas several commonly used methods show incorrect coverage also for this application. Finally, the (co)variance for the sum of sWeights in bins of the control variable is calculated, which is a prerequisite for binned \(\chi ^2\) fits of sWeighted data. If statistics are sufficiently large this paper advocates the use of the asymptotically correct expressions in weighted unbinned maximum likelihood fits, in particular over the current nominal method used in the Roofit [12] fitting framework, which was proposed in Refs. [1, 2], and is shown to not generally result in asymptotically correct uncertainties. If computationally feasible, the bootstrapping approach [13] can be a useful alternative. A patch for Roofit to allow the determination of the covariance matrix according to Eq. 18 has been provided by the author and is available starting from Root v6.20.