Improving estimates of the number of fake leptons and other mis-reconstructed objects in hadron collider events: BoB's your UNCLE. (Previously"The Matrix Method Reloaded")

We consider current and alternative approaches to setting limits on new physics signals having backgrounds from misidentified objects; for example jets misidentified as leptons, b-jets or photons. Many ATLAS and CMS analyses have used a heuristic matrix method for estimating the background contribution from such sources. We demonstrate that the matrix method suffers from statistical shortcomings that can adversely affect its ability to set robust limits. A rigorous alternative method is discussed, and is seen to produce fake rate estimates and limits with better qualities, but is found to be too costly to use. Having investigated the nature of the approximations used to derive the matrix method, we propose a third strategy that is seen to marry the speed of the matrix method to the performance and physicality of the more rigorous approach.

We consider current and alternative approaches to setting limits on new physics signals having backgrounds from misidentified objects; for example jets misidentified as leptons, b-jets or photons. Many ATLAS and CMS analyses have used a heuristic "matrix method" for estimating the background contribution from such sources. We demonstrate that the matrix method suffers from statistical shortcomings that can adversely affect its ability to set robust limits. A rigorous alternative method is discussed, and is seen to produce fake rate estimates and limits with better qualities, but is found to be too costly to use. Having investigated the nature of the approximations used to derive the matrix method, we propose a third strategy that is seen to marry the speed of the matrix method to the performance and physicality of the more rigorous approach.

I. INTRODUCTION
Many precision measurements and searches for new physics employ signal regions which have a significant source of background coming from 'fake' objects. A typical concrete example is that of leptons, which can be faked by a mis-reconstructed jet. Alternatively one can consider jets faking b-jets, for which the matrix method was used in [1], or even faking photons. In this article the term 'lepton' shall be used throughout, but all statements made are general to other types of object. Properties whose distributions differ for 'fake' and 'real' objects have been used to underpin data-driven methods of fake rate estimation, one of the most prevalent of which during the first data-taking run of the LHC has been the so-called 'matrix method' [2] used by ATLAS in , and by CMS in [32][33][34] based, apparently, on a description in [35].
The matrix method is the first of three ways of determining fake rates that are compared in this paper. We shall sometimes refer to it for short Method A to facilitate easy comparison with the later Methods B and C. (For quick reference see Table I.) Method A makes use of the fact that fake and real leptons tend to differ in their degree of 'isolation' 1 . Using cuts on isolation (and to a lesser extent other variables) leptons are categorised as either 'tight' or 'loose', the former being largely synonymous with 'more isolated' and the latter with 'less isolated'. In this paper it is shown that the way in which the resulting matrix method derived background estimates are typically used in exist- * UNCLE standing for "Un-biased Confidence Limit Evaluator", a somewhat artificial acronym designed to make the name of Method C in Table I   I: An overview of the three methods discussed in this paper, and their relative strengths and weaknesses. 'Limit quality' refers to whether CL s+b limits tend to have the correct frequentist coverage properties, and also avoid unnecessary over-coverage.
ing SUSY searches can give rise to confidence limits that are unstable (highly variable) indicating they are making non-optimal use of the data. More specifically, over the course of many independent experiments one expects to find a distribution of limits from Method A which has a larger variance (is more widely spread out) than the distribution of limits coming from Methods B and C discussed later. In addition, Method A can produce unphysically negative estimates for fake rates that should be bounded below by zero.
The weaknesses of the matrix method stem from the presence of heuristic, non-mathematical steps in its derivation. Heuristic methods are often useful, but can be problematic if their underlying assumptions are sometimes invalid; we show this is the case with the matrix method. Methods lacking these deficiencies are in principle trivial to construct. In Method B we describe an example of such a method in which a single likelihood is used to perform both the background estimation and the limit setting. Whilst this can be considered the optimal approach, it is shown to be computationally expensive in cases where objects are divided into many categories. 2 In order to marry the best of Methods A and B, we then propose Method C. It is intended that Method C be usable as a drop-in replacement for Method A in the contexts in which the latter has previously been used by ATLAS or CMS. Method C, like Method A, it is partly heuristic (for speed) and so is justified pragmatically. However, careful choice of the approximations it contains allows it to always give physical limits whose distributions very closely resemble the optimal (but prohibitively expensive) Method B.
Note that fake rate estimates in LHC analyses are likely to find themselves being used as part of a CL s frequentist limit setting procedures, since these are endemic in ATLAS and CMS papers. Such usage requires a likelihood for a set of parameters given observed data; in the case where one counts events in just one region, it typically takes the form of a Poisson distribution having some mean. Background estimates in an analysis are then interpreted as an auxiliary measurement which constrains this mean through additional terms in the likelihood. In this context, the fake rate ought to be an estimate of the expected number of events from the fake background process in the signal region, given data collected outside of the signal region. This is not strictly adhered to in the matrix methods, and so is one of the general places in which Methods B and C improve on A.

Definition of terms
Events collected into some signal region are defined in terms of the numbers of leptons they contain. A cut on some measure of quality, for example isolation, distinguishes a given lepton as loose (l) or tight (t), where l ∪ t ≡l and l ∩ t = ∅. Each lepton will also be regarded as either real (r) or fake (f ), depending on whether it is a correctly reconstructed lepton, or for example a misreconstructed heavy flavour jet. According to the precise selection, a certain number of tight and loose leptons will be required for an object to make it into a signal regionthose that do are described as tight events (T ), and those that do not but could be made to pass the selection for some permutation of t and l on its constituent leptons are denoted as loose events (L), where as before L ∪ T ≡L and L ∩ T = ∅.
A core concept in all methods considered here is that of the real and fake efficiencies, respectively defined to be ε r ≡ P (t|rl) and ε f ≡ P (t|fl). For convenience we will also useε r ≡ 1 − ε r = P (l|rl) andε f ≡ 1 − ε f = P (l|fl). Typically these quantities are measured in additional control regions, and could be subdivided according to kinematic quantities, such as lepton p T . In this text such categories will be labelled ω 1 , ω 2 , . . ., with the efficiencies gaining an additional subscript e.g. ε ω1r .
For a given event containing m leptons, each lepton is observed to be either l or t, and will have some category ω i . If there are N ω possible categories for each lepton, then the number of measurable event categories will be N Ω = (2 × N ω ) m . 3 Each of these will correspond to an event that is either L or T .
Experimentally, one counts how many events fall into each of the N Ω sub-regions, yielding the set of integers {n Ωi }. For the purpose of the physics analysis being performed, one might be interested in the total number of tight events, n T = Ωi⊂T n Ωi . Usually this is the quantity with which a limit on the cross section of a new physics model is placed.
The observed numbers of events are often assumed to be the particular values of a Poisson distributed random variable. That is, one can have n T ∼ Poiss (ν T ); in general the indices on the rate ν correspond to those on the observation n.
A. Method A: the "matrix method" This section attempts to document Method A, the matrix method, in more detail than has previously been done, and in its most general form. As mentioned previously it is a somewhat heuristic method, but its assumptions shall be interpreted on a firmer statistical footing in a subsequent section.

Events with only one lepton
Consider first a simplified scenario where each event has exactly one lepton; n T tight and n L loose events are observed. The key relation is then that where n R and n F are the number of the observed events which are real and fake, respectively. In this context, n L = E [n L |n R , n F ], and similarly for n T . The result follows by considering the real/fake event counts to be random variables following a Poisson distribution, which are then further divided into tight and loose components according to a binomial distribution using the probabilities contained in the efficiencies.
It can be noted that equation (1) is similar to a relation between the means of Poisson distributions This is used later when discussing Method B and Method C, but for now we shall proceed with equation (1). This equation may legitimately be inverted provided that ε r = ε f . Given the model assumptions that were made, the steps described hitherto all hold water on mathematical grounds. In contrast, the next step that is usually taken to motivate Method A is quite arbitrary, and is justified largely on grounds that it is effective in situations with large numbers of events, rather than because it is meaningful in general. 4 This 'heuristic' step consists of the removal of the expectation brackets from the right hand side of equation (3) and the reinterpretation of the terms on its left hand side as a pair of quantitiesn R andn F as follows: What aren R andn F ? They depend on n T and n L and so are functions of the data, and may be regarded as estimators -but estimators for what? It is shown in Appendix A that under some additional assumptions, and for certain values of n T and n L , they turn out to be maximum likelihood estimators for n R and n F given knowledge of n T and n L (i.e. estimators for n R ≡ E [n R |n T , n L ] and n F ≡ E [n F |n T , n L ]) 5 . Nonetheless, and in the absence of anything better, Method A instead usesn R andn F as estimators for the unknown and unknowable actual rates of real and fake events, ν R and ν F . Note that these estimators are sometimes pretty bad as equation (4) allows terms on its left hand side to become unphysically negative. 6 This happens in real analyses (e.g. [5]) creating problems that need to be solved by adhoc methods. Both Method B and Method C have the benefit of avoiding such problems.
Finally, Method A obtains its desired goal, the definition of an estimator for the expected number of fake events in the signal region,n T F , motivated by equation (1) with the replacement n F →n F , wheren F is the estimator obtained above in (4). This results in: Again, note the problems with this method that, even if ε r > ε f (as must be the case for a useful definition of t and l), equation (5) can yieldn T F < 0, an unphysical result which is symptomatic of the earlier "sleight of hand". This concludes our description of how Method A is used in single lepton events to calculate a number which is used as if it were an estimate of the expected rate of fakes in the signal region. We will now describe how the same method is extended for use in events with more than one lepton, which has previously not been documented in detail.

Events with multiple leptons
Consider an event with two leptons, where each lepton can be in one of a number of categories {ω i }. One may define quantities such as n tl , the number of events with the first 7 lepton tight and the second loose -others are defined similarly. In order to include the possible categories for each lepton, event counts such as n tt must be further subdivided to take into account all combinations: In this notation, n ω1ω2 tt indicates the number of events with two tight leptons, where the first is in category ω 1 , and the second in ω 2 .
The analogous relation to equation (1) is then where summation over repeated upper and lower indices is implied where appropriate 8 . The identifier n has been replaced for clarity with the symbols n T and n R , depending on whether the accompanying indices pertain to tight/loose-ness, e.g. n T tl = n tl , or real/fake-ness, so n Rrf = n rf . Each Greek lower index of n T hence takes values in {t, l}, while each Greek lower index of n R takes values in {r, f }. The index on these indices corresponds to the which lepton is being described; i.e. in equation (8), the value of α 2 represents whether the second lepton is either real or fake. The matrix representation for φ ωi shown in the last line of equation (8) is not needed to understand this equation, but is required when considering the background estimate for events that are both tight and fake (it is what still identifies this as the "matrix method", despite the new notation).
The estimate for the expected number of events that are fake is thenn F T α1α2 , wherê where the ζ object is responsible for defining what is meant by a fake event. For example, if rr ≡ R and {rf, f r, f f } ≡ F then one would choose ζ 12 12 = ζ 21 21 = ζ 22 22 = 1, and all other components 0. There is in fact a redundancy in the indices, in that all non-zero components have the i th lower index the same as the i th upper index. In general therefore, for the case with any number of leptons where δ j i is the Kronecker delta, and h(β 1 , . . .) is a function of the indices that is 1 for a fake combination, and 0 for a real combination.
In order to estimate the number of events contained withinT F , from equation (9), that are tight, one sums the appropriate component(s). For example, a simple analysis selecting final states with exactly two leptons might define T ≡ tt (i.e. the number of tight events would now be denoted n T ≡ n tt ), and all other possibilities to be L. In this case theT F tt component is the estimate of the number of events that are both tight and fake. For the completely general case with events containing arbitrary numbers of leptons, additional terms and indices are added as necessary to the equations in this section.

Limit setting
ATLAS and CMS analyses use the CL s method [36] to place an upper limit on the event rate (in the sense of the mean of a Poisson distribution, which controls the appearance of events in a signal region) of new physics processes.
In the context of limit setting, the output from the matrix method is treated on a par with those irreducible background components estimated from Monte Carlo simulated (MC) samples. Once the central value is estimated as described in section II A, uncertainties in the measured efficiencies, as well as a statistical uncertainty, can be propagated in the usual way by taking a derivative [37]. The background meanb and uncertainty σ b are fed into a joint likelihood for the signal and background rates, µ and b, given the number of events observed in the signal region n T . In the case with only one background source it takes the form When setting the limit, the nuisance parameter b is profiled away in the usual way to form the test statistic q µ , Confidence intervals (CL s or CL s+b ) at some level can then be formed by following the recipe outlined in [36].

B. Method B: An extended likelihood method
Whilst Method A can suffer from under-coverage, as subsequently discussed in Section III, this can largely be avoided for a purely data-driven background if the full likelihood, including all data used to make the measurement, is used in the limit setting procedure. That is, one should use L(µ, θ|n t , n l , n tt , . . .), where θ represents the set of nuisance parameters. If the leptons can fall into one of several categories, quantities should be replaced with the separate terms from equation (7). Each of these quantities can be considered as an independent random variable with a Poisson distribution. The means of these Poisson distributions will be denoted as functions of the parameters e.g. ν ω1ω1 tt (µ, θ); the likelihood then factorises and takes a form similar to equation (11) The final term represents constraints placed on the nuisance parameters by external measurements.

Choice of parameterisation
The efficacy of any likelihood method depends on a sensible choice of parameterisation. The parameterisation must completely describe how events from both signal and background are expected to be divided between the different event categories without overparameterising. For example, one could directly use ν ω1ω1 tt etc. as the free parameters θ, but this would remove all predictive power! Other researchers [38] have investigated the possibility of applying a method that uses a similar parameterisation to the matrix method. This parameterisation uses the efficiencies described before, in addition to the rates separated both by object category and real/fakeness. Whilst this has an advantage of making minimal assumptions about how a given background process distributes itself between these categories, it does lead to a very large parameter space. For example, even with three objects coming from only three possible categories, there are already 80 such parameters (before considering efficiencies). Since any form of prediction will require a maximisation of the likelihood over this input parameter space, and since such global maximisations become computationally more expensive as dimensionality increases, the authors have chosen to use an alternative parameterisation.

Decision tree parameterisation
Diagrammatically, the parameterisation used in this work is displayed in Figure 1. For every event that is generated, it is first decided how many leptons that event ought to contain. This is controlled by a set of parameters {α m }, each of which corresponds to the probability of forming an event with m leptons. As noted in the caption, these must sum to 1. For each lepton, a category ω i is assigned to it with probability β i , and it is then further assigned to be either r with probability π i or f with probability 1 − π i . Formally, β i ≡ P (ω i |l) and π i ≡ P (f |ω il ). Efficiencies are then used in the usual way to select objects as being t or l.
Using these terms, together with one extra nonnegative parameter denoting the mean of the Poisson distribution controlling the total production of tight events 9 , one can compute the terms such as ν ω1ω1 tt in equation (14). It should be noted that one of these trees must exist for every separate 'component' that is being fitted -that is, at least one for the hypothesised signal process and one for the fake component of the background, and then optionally one or more for other background components that have been estimated using MC samples. 9 One could alternatively use the overall production ofL events, however it is essential to have the rate of T events as a parameter for any signal component, since this is the quantity upon which one wishes to place a limit.

C. Method C: Maximum likelihood estimate
It is later found that Method B is computationally intractable for more than very simple systems. As such, we propose a third method that is found to keep many of the desirable properties of Method B, but also with a much reduced computation time more similar to Method A. This is achieved by using a simple likelihood for limit setting, as in Method A, but feeding it with the true maximum likelihood estimate (MLE) fake rate.
To form an upper limit with Method C, one does the following. Firstly, for the observed data, maximise the likelihood expressed in equation (14) for all nuisance parameters. As mentioned in the discussion for Method B, this likelihood should contain sufficient parameters to describe the signal process, fake background, and any other real backgrounds. The output from this that shall be used is the MLE fake rate with an estimated uncertainty. This uncertainty represents both an uncertainty with which the efficiencies are known, as well as statistical limitations of the observed data. It is estimated with the MINOS method [39], by taking the values of the fake rate where the minimum of the negative log likelihood with respect to the remaining parameters increases by 0.5 from its minimum value. A limit is then placed using an expression identical to that in equation (11), whereb and σ b take the aforementioned MLE fake rate and uncertainty.

III. COMPARISONS USING FREQUENTIST LIMITS
Using a toy event generator, written by the authors, datasets are produced using the same method as that which is depicted in Figure 1, containing a mixture of 'fake' and 'signal' events. For each of several configurations, 19000 independent datasets were formed using the generator. Each of these was subsequently processed using Methods A, B and C. In all cases the necessary minimisation of a negative log likelihood was performed using the Minuit2 library [39]. The result are 95% CL s+b and CL s upper limits on the signal strength parameter 10 .
There is some discussion in the literature regarding how the incorporation of background components comprising a mean with some uncertainty affects the frequentist coverage properties of p-value limits [41]. In particular, when one is considering a background that is constrained e.g. from an MC sample, the acceptance region for the hypothesis test in the full Neyman construction 10 The p-values used to compute CLs and CL s+b are computed by performing pseudo-experiments, rather than using asymptotic methods [40], since it is known that the latter are only a good approximation for scenarios with a large number of events. In this work we focus on regions with low numbers of events.
1 lepton FIG. 1: The probability tree in this figure illustrates the model used to parameterise fake and real lepton production, as used in Method B and Method C. The left-most branch is complete, the others are not as indicated by the presence of ". . . ". In general one could allow both for more lepton categories, as well as more leptons in the event. Note that mmax m=0 αm = Nω i=1 βi = 1, where mmax is the largest number of leptons that can be produced in a given event. Additionally, the abbreviationπi = 1 − πi is used.
will vary according to the value assumed by the nuisance parameter(s) controlling the strength of the background. In an approximated scheme, such as the profiling method used in the computation of CL s+b and CL s , the coverage can hence deviate from that nominally expected; potentially significantly if the background overestimates the data. Since both Methods A and C feed information into the likelihood in a similar way (and have the shortcoming that the likelihood used in the limit-setting procedure is not the likelihood for all the data), we should not be surprised if one or both methods under-or over-cover. It is hoped, however, that by virtue of the MLE fake rate being more 'sensible' than that from the matrix method, any deviations in coverage from that nominally expected would be less extreme in Method C than with Method A. Method B should have the most accurate coverage, although it still might not be exactly correct due to the use of profiling. These expectations will be confirmed in the results which follow.

Simple scenario -two leptons, two categories
Firstly, a configuration is used that produces events always with exactly two leptons, each of which can be in one of two categories. There are separate configurations for a signal process, which produces only real leptons (π 1 = π 2 = 0), and a fake process which produces only fake leptons (π 1 = π 2 = 1). The full set of parameters can be found in Appendix B in Table II. In each dataset, 100 events are produced using the tree in Figure 1. As such the number of T events is approximately the sum of two Poisson random variables; one representing the signal component with mean 0.706, and another representing the fake background with a mean of 1.94.
The CL s+b and CL s limits from each of the 19000 generated datasets is shown in Figure 2. The CL s+b limit is shown to have approximately correct coverage for Method A and Method B, but Method C over-covers; deviations from 95% at this level can only be justified on the grounds of the use of a profiled test statistic, rather than the full Neyman construction. This is further considered for the next example. There is also significant over-coverage in the CL s limit, however this is expected due to the definition CL s = CL s+b 1−CL b . In low statistics regimes, often (1−CL b ) < 1, meaning that CL s > CL s+b by a potentially significant margin.
Furthermore, a division of the CL s+b limit according to the number of events observed in the signal region, n T , is also shown in Figure 2. From this figure, whilst it can be seen that overall very similar limits are being placed by all three methods, in fact Method B tends to be most constraining due to its distributions showing longer lower tails. Method B and Method C are together significantly more constraining than Method A (signified by shorter upper tails), and are quite similar to each other; this is encouraging in justifying the use of Method C as an approximation to Method B. Method B is slightly more likely to place a tighter limit, as is to be expected since it makes optimal use of all available information.
A further comparison that can be made is of the fake rate that is the output of the matrix method in Method A, against the MLE of the fake rate obtained in Method B and Method C; this is shown in Figure 3. The spread in the plot demonstrates the property that Method A can predict a negative fake rate, as seen in a significant portion of the generated datasets. It also shows that Method B produces fake rates that cluster more closely around the true value, even at low n T .

Harder scenario -two leptons, eight categories
The simple scenario above has been extended to use eight categories instead of two. As per the parameterisation in Figure 1, this involves the addition of 24 extra parameters -twelve each for the signal and fake background from the addition of six β and six π terms. The full set of parameters can be found in Appendix B in Table III. As before, 100 events were generated in each dataset, corresponding to a signal rate of 0.748 and a fake background rate of 2.77.
It was found that the increase in parameter space dimensionality was sufficient to increase the computation time for the likelihood maximisation to such an extent that producing limits with Method B became infeasible using the resources at the authors' disposal; as such only results from Method A and Method C could be computed. Figure 5 shows that the MLE fake rate for Method C is much more tightly constrained around the true value For the simple scenario described in the text (two leptons in two categories) in which computations using Method B remain tractable, the estimated fake rates for each of 19000 independent toy datasets are shown as a function of nT , comparing Method A and Method B with box plots. The fake rate from Method C is by definition the same as that from Method B. The box plots indicate the median and lower & upper quartiles with the box, while the whiskers extend to most extreme datum within 1.5×inter-quartile range of the nearest quartile; this corresponds to the k = 1.5 case as described in [42]. Black dots are used to mark data points outside the range of the whiskers. The dashed blue line marks the true value of νT F = 1.94, and the red line delimits the unphysical νT F < 0 region. than the Method A estimate; moreover Method A gives even more significant deviations into negative values than with the simple scenario. Furthermore, as n T increases, the median fake rate from Method A decreases slightly, whereas that from Method C is stable for low event counts, only increasing slightly for larger n T ; the Method C behaviour seems more desirable here. Secondly, Figure 4 shows that the CL s+b limits derived in Method A suffer from under-coverage 11 ; the upper limit only bounds the true rate 92% of the time rather than the expected 95%. Finally the upper tails of the CL s+b limit are significantly more pronounced in Method A than in Method C, as can be seen when the limits are separated by n T , as also included in Figure 4.

IV. CONCLUSIONS
We have described the matrix method, used in many ATLAS and CMS analyses to estimate fake leptonic backgrounds, more completely than we have seen elsewhere. We have shown that it (Method A) produces a MLE fake rate under a restrictive set of conditions, but that these are rarely met in practice. We have shown that it has a number of undesired properties which result from its heuristic definition: (i) it can give physically meaningless  results (predict negative fake rates), (ii) its fake rate estimates show an undesired bias as a function of the number of tight events in the signal region (as seen by the slope observed in the run of green boxes of Figures 3 and 5), and (iii) the limits it sets on fake rates are significantly more variable than those from better methods (as seen by the increased vertical extent of the Method A histograms in Figures 2 and 4 compared to those of Method B and Method C). We noted that, within the constraints of the frequentist profile-likelihood based framework considered, one cannot hope to constrain fake rates much better than Method B. However, we saw that the computational overheads of Method B precluded its use in all but the simplest of cases. Finally, we showed that it was possible to find a third approach, Method C, which is computationally of similar complexity to Method A and, though to some extent also heuristic in its definition, nonetheless reproduces much more closely the fake rate estimates of Method B. Method C, in contrast to the matrix method: (i) gives only physically meaningful results (predicts positive fake rates), (ii) its fake rate estimates are unbiased as a function of the number of tight events in the signal region (as seen by the lack of slope across the yellow boxes of Figures 3 and 5), and (iii) the limits it sets on fake rates are significantly less variable than those of Method A while being very close to those seen in the optimal Method B (again see Figures 2 and 4). The improvements seen in Method C over Method A are particularly notable in signal regions having few events. A possible advantage of Method B and Method C not explored in this paper is afforded by their ability, if desired, to encapsulate background processes which can contribute by both real and fake events e.g. in different decay modes by use of different parameter trees. Further to this, it may be found that measurements in additional regions could constrain some of these parameters, much like the efficiencies are already constrained. Whilst speculative, further research in this area has the potential to provide even greater benefits over the original matrix method.

Appendix A: Origin of matrix method approximation
It was stated earlier that, under appropriate conditions,n R andn F are maximum likelihood estimators for n R ≡ E [n R |n T , n L ] and n F ≡ E [n F |n T , n L ]. This result, together with its limitations, is now presented.

Single lepton and single category
We shall firstly demosntrate this approximation in the simplified case with just a single lepton and category. The corresponding fully general derivation follows in the next section, but the logic it follows is largely the same.
When considering a likelihood as a product of Poisson terms as in equation (11), and neglecting the Gaussian terms involving the efficiencies, the negative log likelihood for the term arising from the background component will be Here we sum over two constraints, one from tight events and the other from loose. The means of Poisson distributions are denoted by ν T . From equations (2) and (8), one finds where ν R denotes the means of the Poisson distributions from which the numbers of real and fake events are drawn, and φ is the matrix of efficiencies with indices T and R referring to tight/loose and real/fake properties respectively.
One can now differentiate equation (A1) with respect to ν R , ∀R ∈ {R, F }, using the identity in equation (A2), and find the MLE values for the rates, denoted ν T ,MLE . In order to locate the minimum of the negative log likelihood, one sets these derivatives to 0, yielding These are satisfied if ν T ,MLE = n T ∀T ∈ {T, L}, the result of which being that, upon inversion, equation (A2) will look like analogously to equation (3). Whilst this is a valid operation for the problem as stated above, it should be noted that the minimum of − ln L is represented by equation (A3) only when the components of ν R,MLE are > 0.
To conclude, equation (A4) shows how the matrix method estimatorn R is identical to the MLE values ν R,MLE in the simplified single-lepton matrix method, if the condition above is met.

Multiple leptons and multiple categories
When considering a likelihood as a product of Poisson terms as in equation (11), and neglecting the Gaussian terms involving the efficiencies, the negative log likelihood for the term arising from the background component will be where for a set of m leptons the categories and tight/looseness information are compacted into vectors ω and β of length m respectively. Note also that the means of Poisson distributions are denoted in the general notation by e.g. ν T . From equations (2) and (8), one finds where α is a vector representing whether each lepton is real or fake.
One can now differentiate equation (A5) with respect to ν R ω α , ∀ω, α, using the identity in equation (A6), and find the MLE values for the rates, denoted e.g. ν T ,MLE . In order to locate the minimum of the negative log likelihood, one sets all these derivatives to 0, yielding These are satisfied if ν T ,MLE ω β = n T ω β ∀β , the result of which being that, upon inversion, equation (A6) will look like analogously to equation (3). Whilst this is a valid operation for the problem as stated above, it should be noted that the minimum of − ln L is represented by equation (A7) only when the components of ν R,MLE ω α are > 0. Incidentally, is also only useful in the case where the components of ν R,MLE ω α are readily assigned to either signal plus other 'real' backgrounds (those typically estimated from MC samples) and the fake background.
To conclude, equation (A8) shows how the matrix method estimatorn R is identical to the MLE values ν R,MLE in the fully generalised matrix method, if the conditions above are met.

Appendix B: Tables of parameters
Parameters used to configure the toy generator may be found in the following tables:

Object
Signal Background category νL β π νL β π εr ε f ω1 0.01 0.6 0 0.99 0.6 1 0.8 0.1 ω2 -0.4 0 -0.4 1 0.9 0.2 TABLE II: Parameters controlling the simple scenario with exactly two leptons, and two categories for each lepton. The parameters are as described in Figure 1, however α2 = 1 and αi = 0 ∀i = 2. The overall production rate of events is νL, each one of which is filtered through the decision tree. Components marked with a '-' are not applicable in the context.  Figure 1, however α2 = 1 and αi = 0 ∀i = 2. The overall production rate of events is νL, each one of which is filtered through the decision tree. Components marked with a '-' are not applicable in the context.
[1] Search for strong production of supersymmetric particles in final states with missing transverse momentum and at least three b-jets using 20.