Exclusion probabilities and likelihood ratios with applications to mixtures

Abstract

The statistical evidence obtained from mixed DNA profiles can be summarised in several ways in forensic casework including the likelihood ratio (LR) and the Random Man Not Excluded (RMNE) probability. The literature has seen a discussion of the advantages and disadvantages of likelihood ratios and exclusion probabilities, and part of our aim is to bring some clarification to this debate. In a previous paper, we proved that there is a general mathematical relationship between these statistics: RMNE can be expressed as a certain average of the LR, implying that the expected value of the LR, when applied to an actual contributor to the mixture, is at least equal to the inverse of the RMNE. While the mentioned paper presented applications for kinship problems, the current paper demonstrates the relevance for mixture cases, and for this purpose, we prove some new general properties. We also demonstrate how to use the distribution of the likelihood ratio for donors of a mixture, to obtain estimates for exceedance probabilities of the LR for non-donors, of which the RMNE is a special case corresponding to LR>0. In order to derive these results, we need to view the likelihood ratio as a random variable. In this paper, we describe how such a randomization can be achieved. The RMNE is usually invoked only for mixtures without dropout. In mixtures, artefacts like dropout and drop-in are commonly encountered and we address this situation too, illustrating our results with a basic but widely implemented model, a so-called binary model. The precise definitions, modelling and interpretation of the required concepts of dropout and drop-in are not entirely obvious, and we attempt to clarify them here in a general likelihood framework for a binary model.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

References

  1. 1.

    Balding D, Buckleton J (2009) Interpreting low template DNA profiles. Forensic Sci Int Genet 4(1):1–10

    CAS  Article  Google Scholar 

  2. 2.

    Buckleton J, Curran J (2008) A discussion of the merits of random man not excluded and likelihood ratios. Forensic Sci Int Genet 2:343–348

    Article  Google Scholar 

  3. 3.

    Buckleton J, Triggs C, Walsh S (eds.) (2005) Forensic DNA Evidence Interpretation. CRC Press, Florida, USA

  4. 4.

    Cowell R, Graversen T, Lauritzen S, Mortera J (2015) Analysis of forensic DNA mixtures with artefacts. J R Stat Soc Ser C Appl Stat 64(1):1–48

    Article  Google Scholar 

  5. 5.

    Curran J, Gill P, Bill M (2005) Interpretation of repeat measurement DNA evidence allowing for multiple contributors and population substructure. Forensic Sci Int 148 (1):47–53

    CAS  Article  Google Scholar 

  6. 6.

    Dørum G, Kling D, Baeza-Richer C, Magariṅos MG, Sæbø S, Desmyter S, Egeland T (2014) Models and implementation for relationship problems with dropout. Int J Leg Med 129(3):411–423

    Article  Google Scholar 

  7. 7.

    Gill P, Gusmão L, Haned H, Mayr W, Morling N, Parson W, Prieto L, Prinz M, Schneider H, Schneider P, Weir B (2012) DNA commission of the International Society of Forensic Genetics: Recommendations on the evaluation of STR typing results that may include drop-out and/or drop-in using probabilistic methods. Forensic Sci Int Genet 6 (6):679–688

    CAS  Article  Google Scholar 

  8. 8.

    Gill P, Haned H (2013) A new methodological framework to interpret complex DNA profiles using likelihood ratios. Forensic Sci Int Genet 7:251–263

    CAS  Article  Google Scholar 

  9. 9.

    Haned H, Slooten K, Gill P (2012) Exploratory data analysis for the interpretation of low template DNA mixtures. Forensic Sci Int Genet 6(6):762–774

    CAS  Article  Google Scholar 

  10. 10.

    Kruijver M (2015) Efficient computations with the likelihood ratio distribution. Forensic Sci Int Genet 14:116–124

    Article  Google Scholar 

  11. 11.

    Kruijver M, Meester R, Slooten K (2015) P-values should not be used for evaluating the strength of DNA evidence. Forensic Sci Int Genet 16:226–231

    CAS  Article  Google Scholar 

  12. 12.

    Nothnagel M, Schmidtke J, Krawczak M (2010) Potentials and limits of pairwise kinship analysis using autosomal short tandem repeat loci. Int J Legal Med 124(3):205–215

    Article  Google Scholar 

  13. 13.

    Slooten K, Meester R (2011) Forensic identification: the Island Problem and its generalizations. Statistica Neerlandica 65:202–237

    Article  Google Scholar 

  14. 14.

    Slooten K, Egeland T (2014) Exclusion probabilities and likelihood ratios with applications to kinship problems. Int J Legal Med 128(3):415–425

    Article  Google Scholar 

  15. 15.

    Slooten K, Meester R (2014) Probabilistic strategies for familial DNA searching. J R Stat Soc Ser C Appl Stat 63(3):361–384

    Article  Google Scholar 

  16. 16.

    Steele C, Balding D (2014) Statistical evaluation of forensic DNA profile evidence. Annual Review of Statistics and Its Application 1:361–384

    Article  Google Scholar 

  17. 17.

    Thompson E (2000) Statistical inference from genetic data on pedigrees. In: NSF-CBMS regional conference series in probability and statistics. JSTOR

  18. 18.

    Westen A, Kraaijenbrink T, de Medina AR, Harteveld J, Willemse P, Zuniga S, van der Gaag K, Weiler N, Warnaar J, Kayser M, Sijen T, de Knijff P (2014) Comparing six commercial autosomal STR kits in a large Dutch population sample. Forensic Sci Int Genet 10:55–63

    CAS  Article  Google Scholar 

Download references

Acknowledgments

The work of the second author leading to these results was financially supported from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement n 0 285487 (EUROFORGEN-NoE).

Author information

Affiliations

Authors

Corresponding author

Correspondence to Thore Egeland.

Appendices

Appendix A: LR properties

In this section, we prove the properties (2.13) and (2.15). The first one is a generalization of the proof of Eq. 2.11 in [14].

$$\begin{array}{@{}rcl@{}} P(LR(\mathcal{H}_{d}) \geq x) &=& \sum\limits_{y \geq x}P(LR(\mathcal{H}_{d})=y)\\ &=& \sum\limits_{y \geq x}\frac{1}{y}P(LR(\mathcal{H}_{p})=y)\\ &=& \sum\limits_{y \geq x} \frac{1}{y}P(LR(\mathcal{H}_{p})=y \mid LR(\mathcal{H}_{p}) \geq x)\\&&P(LR(\mathcal{H}_{p}) \geq x) \\ &=& P(LR(\mathcal{H}_{p}) \geq x)E[LR^{-1} (\mathcal{H}_{p})\mid LR(\mathcal{H}_{p}) \geq x]. \end{array} $$

Note that we need x>0 in this equation, and that the inequalities may be replaced by strict inequalities.

As for the variance,

$$\begin{array}{@{}rcl@{}} Var(LR(\mathcal{H}_{d})\mid LR(\mathcal{H}_{d}) >x )&=&E[LR(\mathcal{H}_{d})^{2}\mid LR(\mathcal{H}_{d}) >x]\\&&-E[LR(\mathcal{H}_{d})\mid LR(\mathcal{H}_{d}) >x]^{2}, \end{array} $$

we compute both terms separately. First, we have

$$\begin{array}{@{}rcl@{}} E[LR(\mathcal{H}_{d}) \mid LR(\mathcal{H}_{d}) > x] &=& \sum\limits_{y>x} y P(LR(\mathcal{H}_{d})=y \mid LR(\mathcal{H}_{d}) > x) \\ &=& \sum\limits_{y>x} y \frac{P(LR(\mathcal{H}_{d})=y)}{P(LR(\mathcal{H}_{d})>x)} \\ &=& \sum\limits_{y>x}\frac{P(LR(\mathcal{H}_{p})=y)}{P(LR(\mathcal{H}_{d})>x)} \\ &=& \frac{P(LR(\mathcal{H}_{p}) >x)}{P(LR(\mathcal{H}_{d})>x)} \\ &=& \frac{1}{E[LR^{-1} (\mathcal{H}_{p})\mid LR(\mathcal{H}_{p}) >x]}. \end{array} $$

The other term equals

$$\begin{array}{@{}rcl@{}} E[LR(\mathcal{H}_{d})^{2}\mid LR(\mathcal{H}_{d}) >x] &=& \sum\limits_{y>x} y^{2} P(LR(\mathcal{H}_{d})=y \mid LR(\mathcal{H}_{d}) > x) \\ &=& \sum\limits_{y>x} y^{2} \frac{P(LR(\mathcal{H}_{d})=y)}{P(LR(\mathcal{H}_{d})>x)}\\ &=& \sum\limits_{y>x}y\frac{P(LR(\mathcal{H}_{p})=y)}{P(LR(\mathcal{H}_{d})>x)}\\ &=& \frac{1}{P(LR(\mathcal{H}_{d})>x)}\sum\limits_{y>x}yP(LR(\mathcal{H}_{p})\\&&=y \mid LR(\mathcal{H}_{p})>x)\\ &&\cdot P(LR(\mathcal{H}_{p})>x) \\ &=& \frac{P(LR(\mathcal{H}_{p})>x)}{P(LR(\mathcal{H}_{d})>x)}\\&& E[LR(\mathcal{H}_{p})\mid LR(\mathcal{H}_{p})>x] \\ &=& \frac{E[LR(\mathcal{H}_{p}) \mid LR(\mathcal{H}_{p})>x]}{E[LR^{-1} (\mathcal{H}_{p})\mid LR(\mathcal{H}_{p}) >x]}. \end{array} $$

Combining these results completes the argument

$$\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;Var(LR(\mathcal{H}_{d})\mid LR(\mathcal{H}_{d}) >x )= $$
$$\frac{E[LR(\mathcal{H}_{p}) \mid LR(\mathcal{H}_{p})>x]}{E[LR^{-1} (\mathcal{H}_{p})\mid LR(\mathcal{H}_{p}) >x]}-\frac{1}{E[LR^{-1} (\mathcal{H}_{p})\mid LR(\mathcal{H}_{p}) >x]^{2}}= $$
$$\frac{E[LR(\mathcal{H}_{p}) \mid LR(\mathcal{H}_{p})>x]E[LR^{-1} (\mathcal{H}_{p})\mid LR(\mathcal{H}_{p}) >x]-1}{E[LR^{-1} (\mathcal{H}_{p})\mid LR(\mathcal{H}_{p}) >x]^{2}}. $$

Appendix B: Detailed calculations for SNP’s

We explain some of the mathematical expressions based on a simple example without dropout or drop-in. There are two contributors to a mixture and the question is whether a person S has contributed (corresponding to H p) or not (corresponding to H d). We assume the contributors to be unrelated, and the suspect to be either a contributor or unrelated to the contributors. This means for example that \(P(\mathcal {M}=M \mid \mathcal {S}=g, \mathcal {S} \neq \mathcal {C}_{1})=P(\mathcal {M}=M)\) since the genotype of a non-contributor does not influence the mixture’s likelihood, because the non-contributor is unrelated to the contributors. Similarly \(P(\mathcal {C}_{i}=g)=f_{g}\) for both contributors. We work with the hypotheses as random variables according to Eqs. 2.4 and 2.5, meaning that we also regard the mixture itself as random.

B.1 One marker

Only one SNP marker is considered initially and the frequencies of the alleles denoted 1 and 2 are p and 1−p with p=0.4 in the numerical examples. We next exemplify Eqs. 2.112.15. Table 1 shows the possible mixtures along with their probabilities and RMNE, and Table 2 gives the distribution of LR.

Table 1 The possible two person SPN mixture, with their probability to occur and RMNE, if the minor allele has frequency 0.4
Table 2 The distribution of LR for a two person SNP mixture

As an example, note that LR attains the maximal value 1/p 2=6.25 when both contributors are homozygous for the rarer allele and this occurs with probabilities

$$\begin{array}{@{}rcl@{}} P(LR(\mathcal{H}_{p})=6.25)&=&P(\mathcal{M}=(1), \mathcal{S}=(1/1) \mid \mathcal{S}=\mathcal{C}_{1})=p^{4}\\&=&0.0256, \end{array} $$
$$\begin{array}{@{}rcl@{}}P(LR(\mathcal{H}_{d})=6.25)&=&P(\mathcal{M}=(1), \mathcal{S}=(1/1) \mid \mathcal{S} \neq \mathcal{C}_{1}) =p^{6}\\&&=0.0040960.\end{array} $$

In Table 3, we give the distributions conditional on the LR being greater than one.

Table 3 The distribution of LR for a two person SNP mixture conditioned on LR>1

We first confirm (2.11) with input from Tables 1 and 2:

$$RMNE=0.0256 \cdot 0.1600 + 0.8448 \cdot 1.0000+0.1296 \cdot 0.3600=0.8956,$$
$$E[LR^{-1}(\mathcal{H}_{p})]=\frac{1}{0.7576}0.2304+ {\cdots} +\frac{1}{6.2500} \cdot 0.0256=0.8956.$$

Note that

$$\begin{array}{@{}rcl@{}} E[LR(\mathcal{H}_{p})]=0.2304 \cdot 0.7576+{\cdots} +0.0256 \cdot 6.2500=1.3964 \end{array} $$

which exceeds 1/RMNE=1.1166 as it should according to Eq. 2.12. Note that, since we are working with random variables as in Eqs. 2.4 and 2.5, the RMNE probability is an average over all possible mixtures.

Consider next (2.13), i.e.

$$\begin{array}{@{}rcl@{}} P(LR(\mathcal{H}_{d}) > x)=P(LR(\mathcal{H}_{p})> x) \cdot E[LR^{-1} (\mathcal{H}_{p})\mid LR(\mathcal{H}_{p}) > x]. \end{array} $$

We exemplify for x=1. From Tables 2 and 3

$$\begin{array}{@{}rcl@{}} &P(LR(\mathcal{H}_{d})>1)=0.4055+0.0467+0.0041=0.4563\\ &P(LR(\mathcal{H}_{p})>1)=0.4800+0.1296+0.0256=0.6352\\ \end{array} $$
$$E[LR^{-1} (\mathcal{H}_{p})\mid LR(\mathcal{H}_{p}) > 1]= \frac{0.7557}{1.1837} +\frac{ .2040}{2.7778}+\frac{0.0403}{6.2500} = 0.7183$$

which confirms (2.13), i.e. 0.4563=0.6352⋅0.7183.

Consider next (2.15), i.e.

$$\begin{array}{@{}rcl@{}} && Var(LR(\mathcal{H}_{d})\mid LR(\mathcal{H}_{d}) > 1)=\\ && \frac{E[LR(\mathcal{H}_{p}) \mid LR(\mathcal{H}_{p})>1]E[LR^{-1} (\mathcal{H}_{p})\mid LR(\mathcal{H}_{p}) >1]-1}{E[LR^{-1} (\mathcal{H}_{p})\mid LR(\mathcal{H}_{p}) >1]^{2}}. \end{array} $$

From Table 3,

$$\begin{array}{@{}rcl@{}} && E(LR^{2}(\mathcal{H}_{d})\mid LR(\mathcal{H}_{d}) > 1)=\\&& 1.1837^{2}\cdot 0.8888+2.7778^{2}\cdot 0.1023+6.2500^{2}\cdot 0.0090= 2.386265,\\ && E(LR(\mathcal{H}_{d})\mid LR(\mathcal{H}_{d}) > 1)=\\&& 1.1837\cdot 0.8888+2.7778\cdot 0.1023+6.2500\cdot 0.0090=1.392491,\\ && Var(LR(\mathcal{H}_{d})\mid LR(\mathcal{H}_{d}) > 1)=2.386265-1.392491^{2}=0.447. \end{array} $$

On the other hand,

$$\begin{array}{@{}rcl@{}} && E[LR(\mathcal{H}_{p})\mid LR(\mathcal{H}_{p}) > 1]=\\&&1.1837 \cdot 0.7557+2.7778 \cdot 0.2040+6.2500 \cdot 0.0403=1.7133 \end{array} $$

and so the right hand side of Eq. 2.15, becomes

$$\begin{array}{@{}rcl@{}} \frac{1.7133 \cdot 0.7183-1}{0.7183^{2}}=0.447, \end{array} $$

as it should.

Table 4 illustrates the different interpretations of LR as a random variable discussed in section “The likelihood ratio as a random variable

Table 4 LR is the ratio of column 3 to 4 or 5 to 6. This exemplifies the two equivalent interpretations of LR as a random variable

The details for the first line of the table are

$$\begin{array}{@{}rcl@{}} P(\mathcal{M}=(1,2) \mid\mathcal{S} =(2/2),\mathcal{S}=\mathcal{C}_{1}) &=&p^{2}+2p(1-p)=0.6400,\\ P(\mathcal{M}=(1,2) \mid \mathcal{S} =(2/2),\mathcal{S} \neq\mathcal{C}_{1}) &=&1-p^{4}-{(1-p)}^{4}=0.8448,\\ P(\mathcal{S}=(2/2) \mid\mathcal{M}=(1,2),\mathcal{S}=\mathcal{C}_{1}) &=& \frac{(2p-p^{2}){(1-p)}^{2}}{1-p^{4}-{(1-p)}^{4}}=0.2727273,\\ P(\mathcal{S}=(2/2) \mid\mathcal{M}=(1,2),\mathcal{S} \neq\mathcal{C}_{1}) &=& {(1-p)}^{2}=0.36, \\ LR(\mathcal{M}=(1,2),\mathcal{S}=(2/2)) &=&\frac{0.6400}{0.8448}=\frac{0.2727273}{0.3600}=0.7576. \end{array} $$

B.2 Several markers

We next include several independent markers, all with the same distribution of allele frequencies as in the case for the one marker above, and first calculate \(P(Log_{10}(LR(\mathcal {H}_{p})) \leq x)\) exactly and approximately. Exact calculations are possible in this case based on functions in the R-package DNAprofiles (there is some specific code at the end of this section) if the number of markers does not exceed roughly 14; for a general number of markers, we can always obtain the distribution from sampling or by the normal approximation. Figure 7 shows the distribution of \(LR(\mathcal {H}_{p})\). The vertical stapled line shows

Fig. 7
figure7

Plot of \(P(\text {Log}_{10}(LR(\mathcal {H}_{p})) \leq x)\) as a function of x for two person mixtures, typed on 14 i.i.d. SNP markers with minor allele frequency 0.4. The stapled curve is for the asymptotic distribution whereas the solid curve corresponds to the exact distribution, as well as the one obtained from simulations (indistinguishable.) The stapled vertical line indicates Log 10(1/RMNE) which is bounded from above by \(Log_{10} (E(LR(\mathcal {H}_{p})))\), the solid vertical line

$$Log_{10}(1/RMNE)=-14Log_{10}(0.8956)=0.6704 $$

and the upper bound

$$Log_{10}\left( E(LR(\mathcal{H}_{p}))=14 Log_{10}(1.3964)=2.0301\right. $$

is the vertical solid line to the right.

Consider next \(P(LR(\mathcal {H}_{d})>x)\). Table 5 includes the exact values and the estimates based on importance sampling.

Table 5 Exact and simulated values are shown for 14 SNP markers with minor allele frequency 0.4 for a two person mixture

Regarding the computation of \(P(LR(\mathcal {H}_{p}) \leq 1/RMNE)\) discussed in the main text, the code in Table 6 gives the exact answer 0.27 based on the R-package DNAprofiles.

Table 6 The probability that LR is less than 1/RMNE is calculated exactly based on the R package DNAprofiles

Based on the accurate approximation provided by sampling, we proceed to a larger number of markers accepting that exact calculations are not possible. The asymptotic lognormal approximation discussed in [12] worked reasonably well for 14 markers (details omitted) and is expected to improve for a larger number of markers. We choose the number of markers so that the power is comparable to the numerical results for the NGM markers in the main text and this is achieved for −8.47135/Log 10(RMNE)≈176 markers. Figure 8 shows the results.

Fig. 8
figure8

Plot of \(P(\text {Log}_{10}(LR(\mathcal {H}_{p})) \leq x)\) as a function of x for two person mixtures, typed on 176 i.i.d. SNP markers with minor allele frequency 0.4. The asymptotic distribution (stapled curve) can hardly be distinguished from the simulation based distribution (solid curve). The stapled vertical line indicates Log 10(1/RMNE) which is bounded from above by \(Log_{10} (E(LR(\mathcal {H}_{p})))\), the solid vertical line

Appendix C: Non-standard hypotheses

Finally, we look at two examples of non-standard likelihood ratios where the hypotheses H p and H d have more difference between them then the presence of the person of interest in the mixture. First, we consider the model itself to be different in the most general way, i.e. the number of contributors, the known contributors, the dropout probabilities and the drop-in parameter need not be the same. Second, we consider that the hypotheses agree on the number of contributors, their dropout probabilities and the drop-in parameter, but that one contributor is known under H p and none under H d. Then (2.7) still holds, and we show how the random variable approach of this paper applies in that situation.

C.1 Different models per hypothesis

We have thus far assumed that the probability model for both hypotheses is the same, i.e., that H p and H d specify the same number of contributors, drop-out probabilities, etc. In this example, we will show that this is not strictly necessary, but that the interpretations of the probabilities involving \(\mathcal {H}_{p}\) and \(\mathcal {H}_{d}\) become different when interpreted in terms of H p and H d. Suppose, for example, that H p, resp. H d state n, resp. \(n^{\prime }\) contributors with dropout vector d resp. \(\mathbf {d}^{\prime }\) and drop-in parameter c, resp. \(c^{\prime }\) and that furthermore according to H p the suspect is a contributor whereas this is not the case according to H d. We also suppose that according to H p some contributors are known and according to H d some (possibly others) as well. We now set

$$\hspace*{.5pt}P(\mathcal{H}_{p}=g)= P_{\textbf{d},c}\hspace*{7.5pt}(\mathcal{S}=g \mid\mathcal{M}=M,\mathcal{S}=\mathcal{C}_{1},\mathcal{C}_{T}=\textbf{g}_{T}), $$
$$P({\mathcal{H}{_d}}=g)= \mathit{P}_{{\textbf{d}}^{\!}{^{\prime}}{,}{c^{\prime}{}}}(\mathcal{S}=g \mid\mathcal{M}=M,\mathcal{S} \sim\mathcal{G},\mathcal{C}_{T^{\prime}}=\mathbf{g}^{\prime}_{T^{\prime}}).$$

In that case,

$$\begin{array}{@{}rcl@{}} LR_{\mathcal{H}_{p},\mathcal{H}_{d}}(g)&=&\frac{P(\mathcal{H}_{p}=g)}{P(\mathcal{H}_{d}=g)}\\ &=&\frac{P_{\mathbf{d},c}(\mathcal{S}=g \mid\mathcal{M}=M,\mathcal{S}=\mathcal{C}_{1},\mathcal{C}_{T}=\mathbf{g}_{T})}{{P}_{\mathbf{d}^{\!}{^{\prime}}{,}{c^{\prime}{}}}(\mathcal{S}=g \mid\mathcal{M}=M,\mathcal{S} \sim\mathcal{G},\mathcal{C}_{T}=\mathbf{g}_{T})}\\ &=& \frac{P_{\mathbf{d},c}(\mathcal{M}=M \mid\mathcal{S}=g,\mathcal{S}=\mathcal{C}_{1},\mathcal{C}_{T}=\mathbf{g}_{T})P_{\mathbf{d},c}(\mathcal{S}=g \mid\mathcal{S}=\mathcal{C}_{1},\mathcal{C}_{T}=\mathbf{g}_{T})}{P_{\mathbf{d},c}(\mathcal{M}=M \mid\mathcal{S}=\mathcal{C}_{1},\mathcal{C}_{T}=\mathbf{g}_{T})}\\ && \cdot \frac{{P}_{\mathbf{d}^{\!}{^{\prime}}{,}{c^{\prime}{}}}(\mathcal{M}=M \mid\mathcal{S} \sim\mathcal{G},\mathcal{C}_{T^{\prime}}={\mathbf{g}^{\prime}}_{T^{\prime}})}{{P}_{\mathbf{d}^{\!}{^{\prime}}{,}{c^{\prime}{}}}(\mathcal{M}=M \mid\mathcal{S} =g,\mathcal{S}\sim\mathcal{G},\mathcal{C}_{T^{\prime}}={\mathbf{g}^{\prime}}_{T^{\prime}}){P}_{\mathbf{d}^{\!}{^{\prime}}{,}{c^{\prime}{}}}(\mathcal{S}=g \mid\mathcal{S} \sim\mathcal{G},\mathcal{C}_{T^{\prime}}=\mathbf{g}^{\prime}_{T^{\prime}})}\\ &=& \frac{P_{\mathbf{d},c}(\mathcal{M}=M \mid\mathcal{S}=g,\mathcal{S}=\mathcal{C}_{1},\mathcal{C}_{T}=\mathbf{g}_{T})}{{P}_{\mathbf{d}^{\!}{^{\prime}}{,}{c^{\prime}{}}}(\mathcal{M}=M\mid\mathcal{S}=g,\mathcal{S} \sim\mathcal{G},\mathcal{C}_{T^{\prime}}={\mathbf{g}^{\prime}}_{T^{\prime}})}\\ &&\cdot\frac{P_{\mathbf{d},c}(\mathcal{S}=g \mid\mathcal{S}=\mathcal{C}_{1},\mathcal{C}_{T}=\mathbf{g}_{T})}{P_{\mathbf{d}^{\prime},c^{\prime}}(\mathcal{S}=g \mid\mathcal{S}\sim\mathcal{G},\mathcal{C}_{T^{\prime}}=\mathbf{g}_{T^{\prime}})}\\ &&\cdot \frac{P_{\mathbf{d^{\prime}},c^{\prime}}(\mathcal{M}=M\mid\mathcal{S} \sim\mathcal{G},\mathcal{C}_{T^{\prime}}=\mathbf{g^{\prime}}_{T^{\prime}})}{P_{\mathbf{d},c}(\mathcal{M}=M \mid\mathcal{S}=\mathcal{C}_{1},\mathcal{C}_{T}=\mathbf{g}_{T})}\\ &=& LR_{H_{p},H_{d}}(g)\frac{{P}_{\mathbf{d}^{\!}{^{\prime}}{,}{c^{\prime}{}}}(\mathcal{M}=M\mid \mathcal{C}_{T^{\prime}}={\mathbf{g}^{\prime}}_{T^{\prime}})}{P_{\text{d},}{~}\!_{c}(\mathcal{M}=M \mid\mathcal{C}_{T}=\mathbf{g}_{T})} \end{array} $$

since we assume that in the absence of mixture data, the marginal distributions of \(\mathcal {S}\) are the same for both hypotheses. We see that \(LR_{\mathcal {H}_{p},\mathcal {H}_{d}}(g)=P(\mathcal {H}_{p}=g)/P(\mathcal {H}_{d}=g)\) can be interpreted again as the realization of a random variable, but now as the realization of a random likelihood ratio that tests both the presence of the suspect (in \(LR_{H_{p},H_{d}}(g))\) and the change in model (represented by the second factor). Thus, while the general framework presented in section “The likelihood ratio as a random variable” still applies, the relation between \(\mathcal {H}_{p},\mathcal {H}_{d}\) and H p,H d is not the same as when the same probabilistic model is defined by H p and H d since the random variable \(P(\mathcal {H}_{p}=g)/P(\mathcal {H}_{d}=g)\) can be viewed as a simultaneous test for the contribution of the suspect and for the two different models stating the composition of the mixture.

Note that (2.7) need not apply in this situation. If it does, then the results of section “The likelihood ratio as a random variable” apply to \(LR_{\mathcal {H}_{p},\mathcal {H}_{d}}\) and not necessarily to the original LR. Also, as a result of testing two separate issues simultaneously, it is no longer the case that (2.21) still holds. Indeed, the mixture likelihood under H d can be made arbitrarily small. If for instance H d stipulates that all dropout rates are equal to one, then all detected alleles must be the result of drop-in. The likelihood of this happening can be made arbitrarily small by decreasing c.

C.2 Testing two donors simultaneously

We now illustrate the preceding results in case where the hypotheses H p and H d differ in more than one alleged contributor. Consider as an example a two-person mixture where H p states that the mixture is a mixture of suspect S and victim V, and H d states that its contributors are two unknown individuals. In that case, if the LR in favour of H p is large, this does not necessarily imply that the evidence against S is strong. In [8], it is proposed to carry out a simulation experiment replacing S with a random person and calculate LR’s. Many large LRs may be the result of this. At first glance, this may seem to contradict (2.14) but this is actually not the case.

To see this, we recast this example in our random variable framework. We suppose that we have chosen parameters for the binary model, that the mixture is a two-person mixture, that there is no relatedness to take into account and we let as before \(\mathcal {C}_{1}\) and \(\mathcal {C}_{2}\) be the random variables that describe the genotypes of the first and second donor. Let g V be the observed genotype of the victim. According to H p, the second donor has genotype g V and according to H d, there are no known contributors. In order to define \(\mathcal {H}_{p}\) and \(\mathcal {H}_{d}\), we need them to have the same sample space but \(\mathcal {H}_{p}\) needs to keep the profile of the second donor equal to that of the victim. Thus, we let

$$\begin{array}{@{}rcl@{}} P(\mathcal{H}_{p}=(g_{1},g_{2}))&=& P(\mathcal{C}_{1}=g_{1},\mathcal{C}_{2}=g_{2} \mid\mathcal{M}=M,\mathcal{C}_{2}=g_{V}) \\ &=& \left\{\begin{array}{ll} 0\qquad\qquad\qquad\qquad\qquad\qquad\quad \text{if} g_{2} \neq g_{V}, \\ P(\mathcal{C}_{1}=g_{1} \mid\mathcal{C}_{2}=g_{V},\mathcal{M}=M) \text{ if } g_{2}=g_{V}, \end{array}\right. \end{array} $$
$$\begin{array}{@{}rcl@{}} P(\mathcal{H}_{d}=(g_{1},g_{2}))&=&P(\mathcal{S}_{1}=g_{1},\mathcal{S}_{2}=g_{2} \mid\mathcal{M}=M,\mathcal{S}_{1} \sim\mathcal{G},\mathcal{S}_{2} \sim\mathcal{G})\\ &=&f_{g_{1}}f_{g_{2}}, \end{array} $$

where f g is the population frequency of profile g. Then \(LR_{\mathcal {H}_{p},\mathcal {H}_{d}}(g_{1},g_{2})=0\) unless g 2=gV and in that case,

$$\begin{array}{@{}rcl@{}} LR_{\mathcal{H}_{p},\mathcal{H}_{d}}(g_{1},g_{2})&=&\frac{P(\mathcal{H}_{p}=(g_{1},g_{2}))}{P(\mathcal{H}_{d}=(g_{1},g_{2}))}\\ &=&\frac{P(\mathcal{M}=M\mid\mathcal{C}_{1}=g_{1},\mathcal{C}_{2}=g_{V})}{P(\mathcal{M}=M)}\\&&\;\;\;\;\;\;\;\frac{1}{P(\mathcal{C}_{2}=g_{V}\mid\mathcal{M}=M)}\\ &=&LR_{H_{p},H_{d}}(g_{1})\frac{1}{P(\mathcal{C}_{2}=g_{V} \mid\mathcal{M}=M)}. \end{array} $$

In this product, the first term corresponds to the LR calculated with the original hypotheses H p and H d (mixture of suspect with genotype g 1 and victim with genotype gV versus mixture of two unknowns), which we denote again by \(LR_{H_{p},H_{d}}(g_{1})\). We still have

$$P(LR(\mathcal{H}_{d}) > x) < \frac{1}{x},$$

but it is extremely unlikely that \(\mathcal {H}_{d}\) selects g 2=g V as second genotype. If we condition on this, we get for x>0, after some calculations, that for a genotype g 1 drawn at random from the population,

$$ P(LR_{H_{p},H_{d}}(g_{1}) > x)< \frac{1}{x}\frac{P(\mathcal{C}_{2}=g_{V} \mid \mathcal{M}=M)}{P(\mathcal{C}_{2}=g_{V})}. $$
(C.1)

Note that

$$\frac{P(\mathcal{C}_{2}=g_{V} \mid\mathcal{M}=M)}{P(\mathcal{C}_{2}=g_{V})}=\frac{P(\mathcal{M}=M \mid\mathcal{C}_{2}=g_{V})}{P(\mathcal{M}=M)} $$

is the LR in favour of the second contributor having genotype g V. Thus, the bound (C.1) is not useful when there is strong evidence that V is the second contributor. On the other hand, when there is evidence that V is not a contributor, then (C.1) may be informative.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Slooten, KJ., Egeland, T. Exclusion probabilities and likelihood ratios with applications to mixtures. Int J Legal Med 130, 39–57 (2016). https://doi.org/10.1007/s00414-015-1217-z

Download citation

Keywords

  • DNA mixtures
  • Weight of evidence
  • Exclusion probabilities