Abstract
The statistical evidence obtained from mixed DNA profiles can be summarised in several ways in forensic casework including the likelihood ratio (LR) and the Random Man Not Excluded (RMNE) probability. The literature has seen a discussion of the advantages and disadvantages of likelihood ratios and exclusion probabilities, and part of our aim is to bring some clarification to this debate. In a previous paper, we proved that there is a general mathematical relationship between these statistics: RMNE can be expressed as a certain average of the LR, implying that the expected value of the LR, when applied to an actual contributor to the mixture, is at least equal to the inverse of the RMNE. While the mentioned paper presented applications for kinship problems, the current paper demonstrates the relevance for mixture cases, and for this purpose, we prove some new general properties. We also demonstrate how to use the distribution of the likelihood ratio for donors of a mixture, to obtain estimates for exceedance probabilities of the LR for non-donors, of which the RMNE is a special case corresponding to LR>0. In order to derive these results, we need to view the likelihood ratio as a random variable. In this paper, we describe how such a randomization can be achieved. The RMNE is usually invoked only for mixtures without dropout. In mixtures, artefacts like dropout and drop-in are commonly encountered and we address this situation too, illustrating our results with a basic but widely implemented model, a so-called binary model. The precise definitions, modelling and interpretation of the required concepts of dropout and drop-in are not entirely obvious, and we attempt to clarify them here in a general likelihood framework for a binary model.
Similar content being viewed by others
References
Balding D, Buckleton J (2009) Interpreting low template DNA profiles. Forensic Sci Int Genet 4(1):1–10
Buckleton J, Curran J (2008) A discussion of the merits of random man not excluded and likelihood ratios. Forensic Sci Int Genet 2:343–348
Buckleton J, Triggs C, Walsh S (eds.) (2005) Forensic DNA Evidence Interpretation. CRC Press, Florida, USA
Cowell R, Graversen T, Lauritzen S, Mortera J (2015) Analysis of forensic DNA mixtures with artefacts. J R Stat Soc Ser C Appl Stat 64(1):1–48
Curran J, Gill P, Bill M (2005) Interpretation of repeat measurement DNA evidence allowing for multiple contributors and population substructure. Forensic Sci Int 148 (1):47–53
Dørum G, Kling D, Baeza-Richer C, Magariṅos MG, Sæbø S, Desmyter S, Egeland T (2014) Models and implementation for relationship problems with dropout. Int J Leg Med 129(3):411–423
Gill P, Gusmão L, Haned H, Mayr W, Morling N, Parson W, Prieto L, Prinz M, Schneider H, Schneider P, Weir B (2012) DNA commission of the International Society of Forensic Genetics: Recommendations on the evaluation of STR typing results that may include drop-out and/or drop-in using probabilistic methods. Forensic Sci Int Genet 6 (6):679–688
Gill P, Haned H (2013) A new methodological framework to interpret complex DNA profiles using likelihood ratios. Forensic Sci Int Genet 7:251–263
Haned H, Slooten K, Gill P (2012) Exploratory data analysis for the interpretation of low template DNA mixtures. Forensic Sci Int Genet 6(6):762–774
Kruijver M (2015) Efficient computations with the likelihood ratio distribution. Forensic Sci Int Genet 14:116–124
Kruijver M, Meester R, Slooten K (2015) P-values should not be used for evaluating the strength of DNA evidence. Forensic Sci Int Genet 16:226–231
Nothnagel M, Schmidtke J, Krawczak M (2010) Potentials and limits of pairwise kinship analysis using autosomal short tandem repeat loci. Int J Legal Med 124(3):205–215
Slooten K, Meester R (2011) Forensic identification: the Island Problem and its generalizations. Statistica Neerlandica 65:202–237
Slooten K, Egeland T (2014) Exclusion probabilities and likelihood ratios with applications to kinship problems. Int J Legal Med 128(3):415–425
Slooten K, Meester R (2014) Probabilistic strategies for familial DNA searching. J R Stat Soc Ser C Appl Stat 63(3):361–384
Steele C, Balding D (2014) Statistical evaluation of forensic DNA profile evidence. Annual Review of Statistics and Its Application 1:361–384
Thompson E (2000) Statistical inference from genetic data on pedigrees. In: NSF-CBMS regional conference series in probability and statistics. JSTOR
Westen A, Kraaijenbrink T, de Medina AR, Harteveld J, Willemse P, Zuniga S, van der Gaag K, Weiler N, Warnaar J, Kayser M, Sijen T, de Knijff P (2014) Comparing six commercial autosomal STR kits in a large Dutch population sample. Forensic Sci Int Genet 10:55–63
Acknowledgments
The work of the second author leading to these results was financially supported from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement n 0 285487 (EUROFORGEN-NoE).
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix A: LR properties
In this section, we prove the properties (2.13) and (2.15). The first one is a generalization of the proof of Eq. 2.11 in [14].
Note that we need x>0 in this equation, and that the inequalities may be replaced by strict inequalities.
As for the variance,
we compute both terms separately. First, we have
The other term equals
Combining these results completes the argument
Appendix B: Detailed calculations for SNP’s
We explain some of the mathematical expressions based on a simple example without dropout or drop-in. There are two contributors to a mixture and the question is whether a person S has contributed (corresponding to H p) or not (corresponding to H d). We assume the contributors to be unrelated, and the suspect to be either a contributor or unrelated to the contributors. This means for example that \(P(\mathcal {M}=M \mid \mathcal {S}=g, \mathcal {S} \neq \mathcal {C}_{1})=P(\mathcal {M}=M)\) since the genotype of a non-contributor does not influence the mixture’s likelihood, because the non-contributor is unrelated to the contributors. Similarly \(P(\mathcal {C}_{i}=g)=f_{g}\) for both contributors. We work with the hypotheses as random variables according to Eqs. 2.4 and 2.5, meaning that we also regard the mixture itself as random.
B.1 One marker
Only one SNP marker is considered initially and the frequencies of the alleles denoted 1 and 2 are p and 1−p with p=0.4 in the numerical examples. We next exemplify Eqs. 2.11–2.15. Table 1 shows the possible mixtures along with their probabilities and RMNE, and Table 2 gives the distribution of LR.
As an example, note that LR attains the maximal value 1/p 2=6.25 when both contributors are homozygous for the rarer allele and this occurs with probabilities
In Table 3, we give the distributions conditional on the LR being greater than one.
We first confirm (2.11) with input from Tables 1 and 2:
Note that
which exceeds 1/RMNE=1.1166 as it should according to Eq. 2.12. Note that, since we are working with random variables as in Eqs. 2.4 and 2.5, the RMNE probability is an average over all possible mixtures.
Consider next (2.13), i.e.
We exemplify for x=1. From Tables 2 and 3
which confirms (2.13), i.e. 0.4563=0.6352⋅0.7183.
Consider next (2.15), i.e.
From Table 3,
On the other hand,
and so the right hand side of Eq. 2.15, becomes
as it should.
Table 4 illustrates the different interpretations of LR as a random variable discussed in section “The likelihood ratio as a random variable”
The details for the first line of the table are
B.2 Several markers
We next include several independent markers, all with the same distribution of allele frequencies as in the case for the one marker above, and first calculate \(P(Log_{10}(LR(\mathcal {H}_{p})) \leq x)\) exactly and approximately. Exact calculations are possible in this case based on functions in the R-package DNAprofiles (there is some specific code at the end of this section) if the number of markers does not exceed roughly 14; for a general number of markers, we can always obtain the distribution from sampling or by the normal approximation. Figure 7 shows the distribution of \(LR(\mathcal {H}_{p})\). The vertical stapled line shows
and the upper bound
is the vertical solid line to the right.
Consider next \(P(LR(\mathcal {H}_{d})>x)\). Table 5 includes the exact values and the estimates based on importance sampling.
Regarding the computation of \(P(LR(\mathcal {H}_{p}) \leq 1/RMNE)\) discussed in the main text, the code in Table 6 gives the exact answer 0.27 based on the R-package DNAprofiles.
Based on the accurate approximation provided by sampling, we proceed to a larger number of markers accepting that exact calculations are not possible. The asymptotic lognormal approximation discussed in [12] worked reasonably well for 14 markers (details omitted) and is expected to improve for a larger number of markers. We choose the number of markers so that the power is comparable to the numerical results for the NGM markers in the main text and this is achieved for −8.47135/Log 10(RMNE)≈176 markers. Figure 8 shows the results.
Appendix C: Non-standard hypotheses
Finally, we look at two examples of non-standard likelihood ratios where the hypotheses H p and H d have more difference between them then the presence of the person of interest in the mixture. First, we consider the model itself to be different in the most general way, i.e. the number of contributors, the known contributors, the dropout probabilities and the drop-in parameter need not be the same. Second, we consider that the hypotheses agree on the number of contributors, their dropout probabilities and the drop-in parameter, but that one contributor is known under H p and none under H d. Then (2.7) still holds, and we show how the random variable approach of this paper applies in that situation.
C.1 Different models per hypothesis
We have thus far assumed that the probability model for both hypotheses is the same, i.e., that H p and H d specify the same number of contributors, drop-out probabilities, etc. In this example, we will show that this is not strictly necessary, but that the interpretations of the probabilities involving \(\mathcal {H}_{p}\) and \(\mathcal {H}_{d}\) become different when interpreted in terms of H p and H d. Suppose, for example, that H p, resp. H d state n, resp. \(n^{\prime }\) contributors with dropout vector d resp. \(\mathbf {d}^{\prime }\) and drop-in parameter c, resp. \(c^{\prime }\) and that furthermore according to H p the suspect is a contributor whereas this is not the case according to H d. We also suppose that according to H p some contributors are known and according to H d some (possibly others) as well. We now set
In that case,
since we assume that in the absence of mixture data, the marginal distributions of \(\mathcal {S}\) are the same for both hypotheses. We see that \(LR_{\mathcal {H}_{p},\mathcal {H}_{d}}(g)=P(\mathcal {H}_{p}=g)/P(\mathcal {H}_{d}=g)\) can be interpreted again as the realization of a random variable, but now as the realization of a random likelihood ratio that tests both the presence of the suspect (in \(LR_{H_{p},H_{d}}(g))\) and the change in model (represented by the second factor). Thus, while the general framework presented in section “The likelihood ratio as a random variable” still applies, the relation between \(\mathcal {H}_{p},\mathcal {H}_{d}\) and H p,H d is not the same as when the same probabilistic model is defined by H p and H d since the random variable \(P(\mathcal {H}_{p}=g)/P(\mathcal {H}_{d}=g)\) can be viewed as a simultaneous test for the contribution of the suspect and for the two different models stating the composition of the mixture.
Note that (2.7) need not apply in this situation. If it does, then the results of section “The likelihood ratio as a random variable” apply to \(LR_{\mathcal {H}_{p},\mathcal {H}_{d}}\) and not necessarily to the original LR. Also, as a result of testing two separate issues simultaneously, it is no longer the case that (2.21) still holds. Indeed, the mixture likelihood under H d can be made arbitrarily small. If for instance H d stipulates that all dropout rates are equal to one, then all detected alleles must be the result of drop-in. The likelihood of this happening can be made arbitrarily small by decreasing c.
C.2 Testing two donors simultaneously
We now illustrate the preceding results in case where the hypotheses H p and H d differ in more than one alleged contributor. Consider as an example a two-person mixture where H p states that the mixture is a mixture of suspect S and victim V, and H d states that its contributors are two unknown individuals. In that case, if the LR in favour of H p is large, this does not necessarily imply that the evidence against S is strong. In [8], it is proposed to carry out a simulation experiment replacing S with a random person and calculate LR’s. Many large LRs may be the result of this. At first glance, this may seem to contradict (2.14) but this is actually not the case.
To see this, we recast this example in our random variable framework. We suppose that we have chosen parameters for the binary model, that the mixture is a two-person mixture, that there is no relatedness to take into account and we let as before \(\mathcal {C}_{1}\) and \(\mathcal {C}_{2}\) be the random variables that describe the genotypes of the first and second donor. Let g V be the observed genotype of the victim. According to H p, the second donor has genotype g V and according to H d, there are no known contributors. In order to define \(\mathcal {H}_{p}\) and \(\mathcal {H}_{d}\), we need them to have the same sample space but \(\mathcal {H}_{p}\) needs to keep the profile of the second donor equal to that of the victim. Thus, we let
where f g is the population frequency of profile g. Then \(LR_{\mathcal {H}_{p},\mathcal {H}_{d}}(g_{1},g_{2})=0\) unless g 2=gV and in that case,
In this product, the first term corresponds to the LR calculated with the original hypotheses H p and H d (mixture of suspect with genotype g 1 and victim with genotype gV versus mixture of two unknowns), which we denote again by \(LR_{H_{p},H_{d}}(g_{1})\). We still have
but it is extremely unlikely that \(\mathcal {H}_{d}\) selects g 2=g V as second genotype. If we condition on this, we get for x>0, after some calculations, that for a genotype g 1 drawn at random from the population,
Note that
is the LR in favour of the second contributor having genotype g V. Thus, the bound (C.1) is not useful when there is strong evidence that V is the second contributor. On the other hand, when there is evidence that V is not a contributor, then (C.1) may be informative.
Rights and permissions
About this article
Cite this article
Slooten, KJ., Egeland, T. Exclusion probabilities and likelihood ratios with applications to mixtures. Int J Legal Med 130, 39–57 (2016). https://doi.org/10.1007/s00414-015-1217-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00414-015-1217-z