Bayes Factor for Investigative Purposes

Bozza, Silvia; Taroni, Franco; Biedermann, Alex

doi:10.1007/978-3-031-09839-0_4

Silvia Bozza^7,8,
Franco Taroni⁸ &
Alex Biedermann⁸

Part of the book series: Springer Texts in Statistics ((STS))

2573 Accesses

Abstract

This chapter develops and discusses Bayes factors for investigative purposes, i.e. situations in which no potential source is available for comparison purposes. A typical example for this is the problem of classifying items or individuals into one of several classes or populations on the basis of available data (e.g., measurements of one or more attributes). More specifically, material of interest is analyzed (e.g., the quantity of cocaine present on banknotes) and results are evaluated in terms of their effect on the odds in favor of a proposition according to which the recovered material originates from a given population (e.g., banknotes in general circulation), compared to an alternative proposition according to which the recovered items originate from another population (e.g., banknotes related to drug trafficking). The problem of discrimination between populations is addressed for various types of discrete and continuous data, respectively, including an extension to continuous multivariate data. The examples developed in this chapter involve classification for two or more populations. The assessment of model performance is addressed as well.

You have full access to this open access chapter, Download chapter PDF

4.1 Introduction

Forensic laboratories routinely face the problem of classifying items or individuals into one of several classes or populations on the basis of available data (e.g., measurements of one or more attributes), when no control material is available for comparison. As discussed in Sect. 1.6, forensic analyses can provide valuable information regarding the category membership of a particular item. For example, it may be of interest to classify banknotes seized on a person of interest as either banknotes from general circulation or banknotes related to drug trafficking (Wilson et al., 2014). The collected material is analyzed (e.g., the degree of contamination with cocaine is measured), and results are evaluated in terms of their effect on the odds in favor of a proposition H ₁ according to which the recovered items originate from a given population (e.g., banknotes in general circulation), compared to an alternative proposition H ₂ according to which the recovered items originate from another population (e.g., banknotes related to drug trafficking).

An assumption made throughout this chapter is that there is a finite number of populations to which an item of interest may belong. Each population will be characterized by a member from a family of probability distributions. Data can be either discrete or continuous, though for the latter it is easier to find examples and applications. There are many instances where the scientific evidence is described by several variables, and available measurements take the form of multivariate data. As mentioned in Sect. 3.1, data do not always present enough regularity so that standard parametric distributions could be used (e.g., the normal model). Moreover, data may present a complex dependence structure with several levels of variation.

This chapter is structured as follows. Sections 4.2 and 4.3 address the problem of classification for various types of discrete and continuous data, respectively. Section 4.4 presents an extension to continuous multivariate data. Note that most of the examples developed in this chapter involve only two populations. An extension to more than two propositions is given in Sect. 4.2.2.

4.2 Discrete Data

This section deals with measurement results in the form of counts, using the binomial model (Sect. 4.2.1) and the multinomial model (Sect. 4.2.2).

4.2.1 Binomial Model

Imagine a case in which the issue is the quality of a consignment of Basmati rice. Basmati is a rice variety originating from the Indian subcontinent that became valuable in international trade in the last decades. This prompted the cultivation of high-yielding Basmati derivatives. Traditional and evolved (non-traditional) varieties, however, have distinct characteristics (e.g., Kamath et al., 2008), and distinguishing between varieties may be a relevant analytical task. Given a batch of Basmati rice of unknown type, the following pair of propositions may be of interest:

H ₁ ::: The batch is traditional Basmati rice.
H ₂ ::: The batch is non-traditional Basmati rice.

Denote by θ ₁ and θ ₂ the proportion of chalky grains in the two populations, respectively. Available counts can be treated as realizations of Bernoulli trials (Sect. 2.2.1) with constant probability of success θ ₁ (θ ₂). Suppose a conjugate beta prior distribution Be(α _i, β _i) is used to model uncertainty about θ _i, where α _i and β _i can be elicited using the available background knowledge (as in Sect. 1.10).

Among several characteristics of interest, such as grain length, thickness, weight, etc., is the percentage of chalky grains, determined by counting the number of grains having chalky area. A sample of size n is inspected, and a total number y of chalky grains are observed. This can be treated as a realization of a binomial distribution Bin(n, θ).

The marginal distribution at the numerator and denominator can be computed as in (1.25):

$$\displaystyle \begin{aligned} \begin{array}{rcl} f_{H_i}(y)=\binom{n}{y}\frac{\varGamma(\alpha_i+\beta_i)\varGamma(\alpha_i+y)\varGamma(\beta_i+n-y)}{\varGamma(\alpha_i)\varGamma(\beta_i)\varGamma(\alpha_i+n+\beta_i)}. \end{array} \end{aligned} $$

This is a beta-binomial distribution with parameters n, α _i, and β _i. The Bayes factor in favor of proposition H ₁ can be computed as in (1.26) and becomes

$$\displaystyle \begin{aligned} \begin{array}{rcl} \!\!\!\frac{f_{H_1}(y)}{f_{H_2}(y)}\!=\!\frac{\varGamma(\alpha_1+\beta_1)\varGamma(\alpha_1+y)\varGamma(\beta_1+n\!-\!y)\varGamma(\alpha_2)\varGamma(\beta_2)\varGamma(\alpha_2+n+\beta_2)}{\varGamma(\alpha_2+\beta_2)\varGamma(\alpha_2+y)\varGamma(\beta_2+n\!-\!y)\varGamma(\alpha_1)\varGamma(\beta_1)\varGamma(\alpha_1+n+\beta_1)}.\quad \end{array} \end{aligned} $$

(4.1)

Example 4.1 (Basmati Rice) Consider a case where 500 rice grains are examined and a total of 200 chalky grains are counted.

Suppose that the prior distribution for the proportion θ ₁ of chalky grains in traditional varieties can be centered at 0.51 with a standard deviation equal to 0.19, while the proportion θ ₂ of chalky grains in non-traditional varieties can be centered at 0.39 with a standard deviation equal to 0.31. The prior parameters (α _i, β _i) can be elicited as in (1.38) and (1.39).

We first write a function beta_prior that computes the prior parameters α _i and β _i according to (1.38) and (1.39).

The hyperparameters of the two beta distributions, say α ₁, β ₁, α ₂, and β ₂ can then be obtained straightforwardly as

The beta-binomial distribution can be calculated straightforwardly using the function dbbinom that is available in the package extraDistr (Wolodzko, 2020).

The Bayes factor provides weak support for the hypothesis that the rice type is traditional rather than non-traditional.

4.2.2 Multinomial Model

The physical and chemical analysis of gunshot residues (GSR) is a well-established field within forensic science. GSR are commonly analyzed to help with issues regarding the distance of firing and alleged activities of persons in incidents involving the use of firearms. A study by Brozek-Mucha and Jankowicz (2001) focused on the use of GSR for discriminating between a selected number of case types (i.e., particular combinations of weapon and ammunition). The authors conducted experiments using six categories, each consisting of a specific combination of weapon and ammunition, called categories A to F. Note that the aim here is not to infer a particular weapon and ammunition as the source of recovered GSR of unknown source. The purpose is only to provide assistance in discriminating between well-defined case types (i.e., categories).

Consider the following pair of competing propositions:

H ₁ ::: The gunshot residue particles are of type D (Beretta pistol and 9 mm Luger ammunition).
H ₂ ::: The gunshot residue particles are of type E (Margolin pistol with Sporting 5.6 mm ammunition).

Denote by θ _1j and θ _2j the proportion of particles in given chemical classes, j = 1, …, k, characterizing categories D (i.e., category 1) and E (i.e., category 2). The number n ₁, …, n _k of particles pertaining to distinct chemical classes 1, …, k, i.e., the chemical classes PbSbBa, PbSb, SbBa, Sb(Sn), Pb, and PbSnPb as specified in Brozek-Mucha and Jankowicz (2001), can be treated as realization from a multinomial distribution f(n ₁, …, n _k∣θ _i1, …, θ _ik), i = 1, 2. A conjugate Dirichlet prior probability distribution f(θ _i1, …, θ _ik∣α _i1, …, α _ik) can be considered for modeling uncertainty about the proportions θ _ij, i = 1, 2 (Sect. 3.2.2).

The marginal distribution at the numerator and the denominator of the Bayes factor in (1.26) can be computed as in (1.25) and becomes

$$\displaystyle \begin{aligned} \begin{array}{rcl} f_{H_i}(n_1,\dots,n_k\mid\alpha_{i1},\dots,\alpha_{ik})=\frac{\varGamma(\alpha_{i})\varGamma(n+1)}{\varGamma(n+\alpha_i)}\prod_{j=1}^{k}\frac{\varGamma(n_k+\alpha_{ij})}{\varGamma(\alpha_{ij})\varGamma(n_j+1)}, \end{array} \end{aligned} $$

where $\alpha _i=\sum _{j=1}^{k}\alpha _{ij}$ and $n=\sum _{j=1}^{k}n_j$. This is a Dirichlet-multinomial distribution with parameters n and α _i1, …, α _ik.

From a decision-theoretic point of view, the questioned items can be classified in category D (decision d ₁) whenever

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} \mathrm{BF }>\frac{l_1/l_2}{\pi_1/\pi_2}, \end{array} \end{aligned} $$

(4.2)

where l ₁ (l ₂) represents the loss incurred when decision d ₁ (d ₂) is erroneous, and a “0 − l _i” loss function is chosen (Sect. 1.9 and Table 1.4), while π ₁∕π ₂ is the prior odds in favor of H ₁.

It may be objected that the values for l ₁ and l ₂ are difficult to assess. However, what really matters is the ratio k of the actual values, l ₁ = k ⋅ l ₂. Note that this is an asymmetric loss function. In this way, starting from a prior odds equal to 1, the criterion in (4.2) may be rewritten as follows:

$$\displaystyle \begin{aligned} \begin{array}{rcl} \mathrm{BF}>k. \end{array} \end{aligned} $$

(4.3)

Stated otherwise, whenever the competing hypotheses are considered equally probable, a priori, the decision d ₁ will be optimal if BF > k, that is if wrongly deciding d ₁ (i.e., H ₂ holds) is less than BF times worse than wrongly deciding d ₂ (i.e., H ₁ holds). Clearly, the prior odds must not necessarily be equal to 1, and the criterion can be adapted accordingly.

4.2.2.1 Choosing the Parameters of the Dirichlet Prior

The problem of how to elicit a prior probability distribution about a proportion has been discussed in Sect. 1.10. In the type of case considered here, an analyst will face the problem of eliciting a prior opinion about a set of proportions, assuming that the subjective prior distribution is chosen from the family of Dirichlet distributions.

There are various options for the hyperparameters α _i1, …, α _ik, characterizing the prior probability distribution on the proportions θ _i1, …, θ _ik. One is the uniform prior probability distribution, with α _ij = 1, j = 1, …, k. Whenever further information is available in terms of the number of outcomes in the distinct categories, e.g., x _i1, …, x _ik, the hyperparameters α _ij can be updated to α _ij + x _ij.

There are cases, however, where the analyst is able to specify a non-uniform prior probability distribution about the proportions. Following the methodology illustrated in Zapata-Vazquez et al. (2014), the prior probability distribution about a set of proportions θ _i1, …, θ _ik can be elicited using tools available in the package SHELF (Oakley, 2008). The user is only asked to provide a lower (e.g., 0.25), a median, and a upper (e.g., 0.75) quantile for the marginal densities of proportions that follow a beta distribution. Details will follow in the next example. The reader can also refer to O’Hagan et al. (2006), where a practical example is provided.

Example 4.2 (Gunshot Residue Particles)

Consider a case in which a given number of particles (266) have been collected and analyzed by a scientist. The particles have been collected from a target surface (e.g., a person’s hands). The counts of gunshot residue particles are as follows:

Total number	Chemical classes
of particles	PbSbBa	PbSb	SbBa	Sb(Sn)	Pb	PbSnPb
266	18	36	2	150	38	22

The scientist is asked to help discriminating between the following two propositions:

H ₁ ::: The gunshot residue particles are of type D (Beretta pistol with Luger 9 mm ammunition).
H ₂ ::: The gunshot residue particles are of type E (Margolin pistol with Sporting 5.6 mm ammunition).

One way to elicit the Dirichlet distribution in the case here is to use observed frequencies of particles in various chemical classes as reported in previous studies (e.g., Brozek-Mucha & Jankowicz, 2001). Suppose that the elicited expert judgments for the marginal proportions characterizing category D are as follows:

Quartiles	Chemical classes
(%)	PbSbBa	PbSb	SbBa	Sb(Sn)	Pb	PbSnPb
Lower	5.00	9.00	0.40	66	9.00	7.60
Median	5.25	9.25	0.45	68	9.25	7.80
Upper	5.50	9.50	0.50	70	9.50	8.00

and those characterizing category E:

Quartiles	Chemical classes
(%)	PbSbBa	PbSb	SbBa	Sb(Sn)	Pb	PbSnPb
Lower	2.35	7.00	0.13	56	24	5.60
Median	2.55	7.50	0.15	58	26	5.80
Upper	2.75	8.00	0.17	60	28	6.00

Consider, first, the elicitation of the Dirichlet distribution concerning the first population, Dir(θ ₁₁, …, θ _1k∣α ₁₁, …, α _1k). Starting from the given lower, median, and upper quartiles for each marginal proportion, the prior distribution can be elicited as follows.

The function fitdist, available in the package SHELF, allows one to fit a parametric distribution starting from the elicited probabilities. In the example here, the parameters of the elicited beta distribution for each proportion are of interest.

The last six objects contain the parameters of the beta distribution that is fitted for each marginal proportion. For example, the parameters α ₁ and β ₁ of the elicited beta distribution of θ ₁ (i.e., proportion of gunshot residue particles in category PbSbBa) can be obtained as

Next, fit the Dirichlet distribution to the elicited marginals by means of the function fitDirichlet that is available in the same package.

The Dirichlet parameters α ₁₁, …, α _1k can be read off from the row shape 1 and will be stored in a vector named a1.

Parameter n of the Dirichlet prior is chosen by minimizing the sum of the beta parameters in each elicited marginal (input n.fitted set equal to min). See Oakley (2008) for more details.

In the same way, the Dirichlet distribution concerning the second population, Dir(θ ₂₁, …, θ _2k∣α ₂₁, …, α _2k), can be elicited.

The Dirichlet parameters α ₂₁, …, α _2k can be read off analogously from the row shape 1 (not shown here) and will be stored in a vector named a2.

The counts of gunshot residue particles are

The density of a Dirichlet-multinomial distribution can be calculated using the function ddirmnom that is available in the package extraDistr (Wolodzko, 2020), and the Bayes factor can be obtained straightforwardly

The Bayes factor provides moderately strong support for the hypothesis that the gunshot residue particles originate from a Beretta pistol with Luger 9 mm ammunition rather than from a Margolin pistol with Sporting 5.6 mm ammunition.

Assume π ₁ = π ₂ = 1. If a “0 − l _i” loss function is introduced, then decision d ₁, classifying the gunshot residue particles into category D, is to be preferred to the alternative decision d ₂ unless wrongly deciding d ₁ is felt more than 659 times worse than classifying the particles in category E.

Note that by choosing a “0 − 1” loss function , or a symmetric “0 − l _i” loss function with l ₁ = l ₂, a BF greater than 1 (or, more generally, greater than π ₂∕π ₁ for unequal prior probabilities) provides a criterion for addressing the classification problem. The aim here was to show that when assuming equal prior probabilities for the hypotheses being compared, then, for the decision d ₂ to be optimal, it is not sufficient to have an asymmetric loss function that assigns a loss to the adverse consequence of decision d ₁ that is greater than the loss assigned to the adverse consequence of decision d ₂. Specifically, this loss must be roughly 659 times greater.

4.2.2.2 More than Two Populations

Consider now the case where more than two weapons (and related ammunitions) could be at the origin of the collected gunshot particles. Suppose that a third weapon is taken into consideration and that the competing propositions are specified as follows:

H ₁ ::: The gunshot residue particles are of type D (Beretta pistol with Luger 9 mm ammunition; population p ₁).
H ₂ ::: The gunshot residue particles are of type E (Margolin pistol with Sporting 5.6 mm ammunition; population p ₂).
H ₃ ::: The gunshot residue particles are of type F (TT-33 pistol with Tokarev 7.62 mm ammunition; population p ₃).

As discussed in Sect. 1.6, the expert may calculate the marginal likelihood $f_{H_i}(y)$ (i.e., a Dirichlet-multinomial distribution) for each proposition and report a scaled version as in (1.27), that is,

$$\displaystyle \begin{aligned} \begin{array}{rcl} f_{H_i}^*(y)=\frac{f_{H_i}(y)}{\sum_{j=1}^{3}f_{H_j}(y)}, \end{array} \end{aligned} $$

or the posterior probabilities

$$\displaystyle \begin{aligned} \begin{array}{rcl} \Pr(H_i\mid y)=\frac{\Pr(H_i)f_{H_i}^*(y)}{\sum_{j=1}^{3} \Pr(H_j)f_{H_j}^*(y)}, \qquad \qquad i=1,\dots,3. \end{array} \end{aligned} $$

Alternatively, the analyst may also consider the possibility of summarizing propositions H ₂ and H ₃ into one as $\bar H_1=H_2\cup H_3$. A pair of competing propositions may thus be formulated as follows:

H ₁ ::: The gunshot residue particles are of type D (Beretta pistol with Luger 9 mm ammunition; population p ₁).
$\bar H_1$::: The gunshot residue particles are of type E (Margolin pistol with Sporting 5.6 mm ammunition; population p ₂) or of type F (TT-33 pistol with Tokarev 7.62 mm ammunition; population p ₃).

The Bayes factor can be obtained as in (1.28), that is,

$$\displaystyle \begin{aligned} \begin{array}{rcl} \mathrm{BF }=\frac{f_{H_1}(y)\sum_{i=2}^{3}\Pr(p_{i})}{f_{\bar H_1}(y)}, \end{array} \end{aligned} $$

(4.4)

where

$$\displaystyle \begin{aligned} \begin{array}{rcl} f_{\bar H_1}(y)=\sum_{i=2}^{3}\Pr(p_{i})\int_{\varTheta_i}f(y\mid\theta_i)\pi(\theta_i\mid p_i)d\theta_i. \end{array} \end{aligned} $$

Example 4.3 (Gunshot Residue Particles—Continued)

Recall Example 4.2, and suppose that the elicited expert judgments for the marginal propositions characterizing category F are as follows:

Quartiles	Chemical classes
(%)	PbSbBa	PbSb	SbBa	Sb(Sn)	Pb	PbSnPb
Lower	6.00	4.50	3.00	65	14.0	3.00
Median	6.15	4.75	3.25	67	14.5	3.25
Upper	6.30	5.00	3.50	69	15.0	3.50

The Dirichlet distribution concerning this new combination of weapon/ammunition can be elicited as before:

The Dirichlet parameters α ₃₁, …, α _3k can be read off from the row shape 1 (not shown here) and will be stored in a vector named a3.

The scaled version of the marginal likelihoods can be easily obtained as

Note that the scaled likelihoods $f^*_{H_i}(y)$ are equivalent to the posterior probabilities $\Pr (H_i\mid y)$ whenever the prior probabilities of the three propositions are equal.

Alternatively, suppose that propositions H ₂ and H ₃ are summarized as above, i.e., $\bar H_1=H_2\cup H_3$, and that the prior probabilities of H ₁ and $\bar H_1$ are equal, so that $\Pr (H_1)=0.5$ and $\Pr (H_2)=\Pr (H_3)=0.25$.

The Bayes factor can then be obtained as

4.3 Continuous Data

The previous section considered the evaluation of scientific evidence in the form of discrete data for investigative purposes. However, for many types of scientific evidence, measurements lead to continuous data. In this section, we discuss parametric and non-parametric models for continuous data.

4.3.1 Normal Model and Known Variance

Suppose that tablets of unknown source are seized, and the question is whether they belong to population A or population B, which differ in color dye concentration. The propositions of interest are as follows:

H ₁ ::: The seized tablets come from population A.
H ₂ ::: The seized tablets come from population B.

The measurement of color dye concentration leads to continuous data for which a normal distribution is considered appropriate, say $X_A\sim \mbox{N}(\theta _A,\sigma _A^2)$ for population A and $X_B\sim \mbox{N}(\theta _B,\sigma _B^2)$ for population B. Suppose that the variance of color dye concentration in the different populations is known. For the population means, a conjugate prior normal distribution is introduced, i.e., $\theta _A\sim \mbox{N}(\mu _A,\tau ^2_A)$ and θ _B ∼N(μ _B, τ _B).

The analysis of a tablet of unknown origin yields the measurement y. The Bayes factor can be obtained as in (1.26), where the marginal likelihoods $f_{H_i}(y)$ are still normal with mean equal to the prior mean μ and variance equal to the sum of the prior variance τ ² and the population variance σ ², $f_{H_i}(y)=\mbox{N}(\mu ,\tau ^2+\sigma ^2)$.

Whenever several measurements (y ₁, …, y _n) are available, it is sufficient to recall that the joint likelihood is proportional to the likelihood of the sample mean $\bar y$, which is normally distributed, $\bar Y\sim \mbox{N}(\theta ,\sigma ^2/n)$, and that the marginal likelihood in correspondence of the sample mean $\bar y$ becomes $f_{H_i}(\bar y)=\mbox{N}(\mu ,\tau ^2+\sigma ^2/n)$.

Example 4.4 (Color Dye Concentration in Ecstasy Tablets)

A tablet of unknown origin is analyzed, and the measured color dye concentration is 0.16 (measurements are in %). A prior probability distribution is elicited for the mean of population A, as θ _A ∼N(0.14, 0.003²), and for the mean of population B, as θ _B ∼N(0.3, 0.016²). The population variances $\sigma _A^2$ and $\sigma _B^2$ are assumed to be known and equal to 0.01² and 0.06², respectively (Goldmann et al., 2004).

The Bayes factor in (1.26) can be obtained straightforwardly as the ratio of two normal likelihoods evaluated for the available measurement of color dye concentration y.

The Bayes factor provides moderate support for the proposition according to which the analyzed tablet comes from population A, rather than the proposition according to which the tablet comes from population B. Note again that this result does not mean that proposition H ₁ is more probable than proposition H ₂. It solely means that the probability to observe the concentration y is roughly 12 times greater if the tablet originates from population A rather than from population B. The posterior odds might be in favor of proposition H ₂ even in the presence of a Bayes factor greater than 1, if the prior probability of proposition H ₁ is sufficiently small. In the case at hand, it can be easily verified that the prior probability of proposition H ₁ needs to be smaller than 0.07 in order for the posterior odds to be in favor of H ₂.

Suppose now that n = 5 tablets are available, and the color dye concentration measurements are y = (0.155, 0.160, 0.165, 0.161, 0.159). The value of the evidence can then be computed for the sample mean

The Bayes factor now provides moderately strong support for the proposition H ₁, compared to proposition H ₂. This is a direct effect of the increased number of measurements.

4.3.2 Normal Model and Unknown Variance

In some applications, both parameters are unknown, and a prior distribution for the population mean and the population variance must be introduced. A non-informative or a subjective prior distribution may be chosen, as mentioned previously in Sect. 3.3.2.

Consider a case where skeletal remains are analyzed, and the question is whether they belong to a man or a woman. The competing propositions are as follows:

H ₁ ::: The skeletal remains belong to a woman.
H ₂ ::: The skeletal remains belong to a man.

The study of Benazzi et al. (2009) found that the measurement of the sacral base is a useful indicator of sex.

Consider a normal probability distribution for the area of the sacral base $X_F\sim \mathrm {N}(\theta _F,\sigma ^2_F)$ for the population of females, and $X_M\sim \mathrm {N}(\theta _M,\sigma ^2_M)$ for the population of males. A conjugate prior probability distribution $f(\theta _i,\sigma ^2_i)$ can be assumed for $(\theta _i,\sigma ^2_i)$ as in (3.12), where $(\theta _i\mid \sigma ^2_i)\sim \mathrm {N}(\mu _i,\sigma ^2_i/n_i)$ and $\sigma ^2_i\sim S_i\cdot \chi ^{-2}(k_i)$, i = {F, M}. This amounts to an inverse gamma distribution with shape parameter α _i = k _i∕2 and scale parameter β _i = S _i∕2, $\sigma ^2_i\sim \mathrm {IG}(k_i/2,S_i/2)$.

The marginal density needed to compute the BF, $f_{H_i}(\cdot )$, is a Student t distribution with k _i degrees of freedom, centered at μ _i, with spread parameter, denoted here sp _i, equal to

$$\displaystyle \begin{aligned}sp_i=\frac{n_i}{n_i+1}\alpha_i\beta_i^{-1} \end{aligned}$$

(as noted previously in Sect. 3.3.2). Note that in this case there is one available measure (n _y = 1).

Example 4.5 (Sex Discrimination for Skeletal Remains)

The sacral base of a skeletal remain is measured and found to be 11.5 cm². The prior probability distribution for $(\theta _p,\sigma ^2_p)$, as illustrated in Sect. 3.3.2, is elicited based on the following population data:

Population	Females	Males
Number of individuals	38	35
Sample mean (cm²)	10.35	14.09
Std dev (cm²)	1.42	1.52

The prior distribution for $(\theta _F\mid \sigma ^2_F)$ and $(\theta _M\mid \sigma ^2_M)$ can be centered at μ _F = 10.35 and μ _M = 14.09, respectively, with n _F = 38 and n _M = 35.

The prior distribution for $\sigma ^2_F$ and $\sigma ^2_M$ can be elicited using the parameter value k = 20 (as in Example 3.6) and choosing S _F and S _M such that

$$\displaystyle \begin{aligned} \begin{array}{rcl} \Pr(\sigma^2_F>1.42^2)=\Pr(\sigma^2_M>1.52^2)=0.5 \end{array} \end{aligned} $$

The prior distributions for $\sigma ^2_F$ and $\sigma ^2_M$ are 39 ⋅ χ ⁻²(20) and 45 ⋅ χ ⁻²(20), respectively. The marginal density in the numerator of the Bayes factor is a Student t distribution with k _F degrees of freedom, centered at μ _F = 10.35 with spread parameter s _F = 0.5 (rounded at the second decimal).

The marginal density in the denominator of the Bayes factor is a Student t distribution with k _M degrees of freedom, centered at μ _M = 14.09 with spread parameter s _M = 0.44 (rounded at the second decimal).

Note that in this case k _F = k _M = k.

The density of a non-central Student t distributed random variable can be calculated using the function dstp available in the package LaplacesDemon (Hall et al., 2020). The Bayes factor can be obtained as follows:

This value provides weak support for the proposition according to which the skeletal remains belong to a woman rather than a man.

4.3.3 Non-Normal Model

As pointed out in Sect. 3.4.1.2, certain types of observations lack sufficient regularity to apply standard parametric models.

Consider a case where banknotes are seized on an individual following an arrest. A question commonly asked in such a case is whether the seized banknotes come from a population of banknotes used in drug dealing activities. The following propositions may thus be formulated:

H ₁ ::: The seized banknotes have been used in illegal drug dealing activities (population p ₁).
H ₂ ::: The seized banknotes are from general circulation (population p ₂).

Figure 4.1 shows histograms of drug intensities measured on banknotes from drug trafficking (left) and general circulation (right). It can immediately be observed that the distributions for the two populations are different, that the distribution related to banknotes involved in drug trafficking is not unimodal, and that the one for banknotes in general circulation is positively skewed (Besson, 2004).

Suppose a database is available $\{{\mathbf {z}}_l=(z_{l1},\dots ,z_{lm_l}),\; l=1,2\}$. The probability distribution for population p _l, f _l(⋅), can be estimated by means of kernel density estimation $\hat f_l(\cdot )$ as

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} \hat f_l(y\mid z_{l1},\dots,z_{lm_l})=\frac{1}{m_l} \sum_{i=1}^{m_l}\text{K}(y\mid z_{li},h_l), \end{array} \end{aligned} $$

(4.5)

where K(y∣z _li, h _l) is taken to be a normal distribution centered at z _li with variance equal to $h_l^2s_l^2$, $s^2_l=\sum _{i=1}^{m_l}(z_{li}-\bar {z}_l)^2/(m_l-1)$, and $\bar {z}_l=\sum _{i=1}^{m_l}z_{li}/m_l$.

The estimate $\hat f_l(y)$ of the probability density is obtained by adding individual densities over all observations in the database and then dividing by the sum of the observations.

Figure 4.2 shows the kernel density estimates $\hat f_1(y\mid z_{11},\dots ,z_{1m_1})$ and $\hat f_2(y\mid z_{21},\dots ,z_{2m_2})$ obtained using (4.5) with the smoothing parameter set equal to 0.15 for both populations. It can be observed that kernel density estimates are more sensitive to multimodality and skewness and provide a better representation of the available data.

Starting from the available measurements y = (y ₁, …, y _n) on a sample of size n, a Bayes factor can be obtained as

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} \mathrm{BF }=\frac{f_{H_1}(y)}{f_{H_2}(y)}=\frac{\prod_{i=1}^{m_1}\hat f_1(y_i\mid z_{11},\dots,z_{1m_1})}{\prod_{i=1}^{m_2}\hat f_2(y_i\mid z_{21},\dots,z_{2m_2})}. \end{array} \end{aligned} $$

(4.6)

Example 4.6 (Contaminated Banknotes)

Consider a case in which 8 banknotes are seized on a person of interest. Laboratory analyses of the banknotes reveal drug intensities [du] equal to y = (322, 158, 114, 125, 361, 801, 798, 135). A database named banknotes.Rdata is available on the book’s website. It contains sample data for drug intensities on banknotes from drug trafficking and general circulation (Fig. 4.1). Note that these are hypothetical data used for the sole purpose of illustration. The (n ₁ × 1) vector of measurements on banknotes from drug trafficking is extracted and denoted pop1; analogously, the (n ₂ × 1) vector of measurements on banknotes from general circulation is extracted and denoted pop2.

The smoothing parameters h ₁ and h ₂ are set equal to 0.15. The variances of drug concentration from each population, $s^2_1$ and $s^2_2$, are estimated by the sample variance

The kernel density estimation in (4.5) for the numerator and the denominator is computed by means of the functions kn1 and kn2, respectively.

The estimated probability densities are represented in Fig. 4.2.

Consider now the vector of measurements y. The probability densities are estimated as in (4.5):

and the Bayes factor is obtained as in (4.6):

The Bayes factor represents moderate support for the proposition according to which the seized banknotes have been used in illegal drug trafficking rather than the proposition according to which they are part of the general circulation.

Sensitivity to the Choice of the Smoothing Parameter

The sensitivity of the BF to the choice of the smoothing parameter may be a cause of concern, as different choices may be made. The smoothing parameter h determines the shape of the estimated probability density: if it is (too) large, the curve $\hat f(y)$ will be (very) smooth; on the other side, if it is (too) small, the resulting curve will be more spiky. Figure 4.3 shows, for both populations, the density curves obtained with h = 0.1 (dotted line), h = 0.15 (solid line), h = 0.2 (dashed line), h = 0.25 (dot-dashed line). The Bayes factor for the available measurements in Example 4.6 is then calculated for several choices of the smoothing parameter h.

Note that the last two values correspond to large values of the smoothing parameter h, providing a very smooth curve.

4.4 Multivariate Data

As mentioned in Sect. 3.4, analysts frequently encounter multivariate data because the features of examined items and materials, such as handwritten or printed documents, glass fragments, or skeletal remains, can be described by more than one variable. Such data often present a complex dependence structure with a large number of variables and multiple levels of variation.

4.4.1 Normal Multivariate Data

The classification of skeletal remains on the basis of sexual dimorphism is a common problem in paleontology. Section 4.3.2 dealt with the question of how to quantify the evidential value of measurements of a given morphological trait (e.g., the profile of the sacral base). A number of studies have documented sex differences in particular pelvic traits, such as the obturator foramen, that tend to be oval in males and triangular in females. The shape of these traits can be described quantitatively by Fourier descriptors following the image analysis procedure developed by Bierry et al. (2010). Each item can be described by means of several variables, i.e., the amplitude and the phase of the first three harmonics.

Suppose that observations are available from a p-dimensional multivariate normal distribution whose mean vector and variance–covariance matrix are θ _l and W _l, respectively, Z _li ∼N(θ _l, W _l), l = 1, 2 (where l = 1 stands for the population of females and l = 2 for the population of males). Suppose further that the prior distribution about (θ _l, W _l) is chosen in the conjugate family of the normal-inverse Wishart distribution NIW(Ω _l, ν _l, μ _l, c _l):^{Footnote 1}

$$\displaystyle \begin{aligned} \begin{array}{rcl} f(\boldsymbol\theta_l,W_l)\propto\mid W_l\mid ^{-(\nu_l+p+2)/2}\exp\left\{-\frac{c_l}{2}(\boldsymbol\theta_l\!-\!\boldsymbol\mu_l)'W_l^{-1}(\boldsymbol\theta_l\!-\!\boldsymbol\mu_l)-\frac 1 2 \mathrm{tr}(W_l^{-1}\varOmega_l)\right\}, \end{array} \end{aligned} $$

where μ _l is the center vector, c _l are the degrees of freedom associated with the center vector μ _l, Ω _l is the dispersion matrix, and ν _l are the degrees of freedom associated with the dispersion matrix Ω _l (O’Hagan & Kendall, 1994).

Consider now a case where skeletal remains are recovered, and the following propositions are of interest:

H ₁ ::: The skeletal remains belong to a woman (i.e., a member of population p ₁).
H ₂ ::: The skeletal remains belong to a man (i.e., a member of population p ₂).

Denote by y = (y ₁, …, y _p) the measurements (i.e., Fourier descriptors) related to the item whose origin is unknown and that needs to be classified. The marginal distribution under the competing propositions H ₁ and H ₂, $f_{H_l}(\mathbf {y})$ for l = 1, 2, can be obtained as

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} f(\mathbf{y}\mid\boldsymbol\mu_l,c_l,\varOmega_l,\nu_l)& =&\displaystyle \int_{\boldsymbol\theta_l,W_l}f(\mathbf{ y}\mid\boldsymbol\theta,W)f(\boldsymbol\theta,W)d({\boldsymbol\theta,W})\\ & \propto&\displaystyle \left\{1\!+(\mathbf{y}-\boldsymbol\mu_l)'\left[\frac{c_l\!+\!1}{c_l}\varOmega_l\right]^{-1}(\mathbf{ y}-\boldsymbol\mu_l)\right\}^{-(\nu_l+1)/2}. \end{array} \end{aligned} $$

(4.7)

This is a p-dimensional Student t distribution with δ _l = ν _l + 1 − p degrees of freedom, location μ _l, and scale matrix

$$\displaystyle \begin{aligned}\varDelta_l=\frac{(c_l+1)\varOmega_l}{(c_l\delta_l)}.\end{aligned}$$

The Bayes factor can be obtained as

$$\displaystyle \begin{aligned} \begin{array}{rcl} \mathrm{BF }=\frac{f(\mathbf{y}\mid \boldsymbol\mu_1,c_1,\varOmega_1,\nu_1)}{f(\mathbf{y}\mid \boldsymbol\mu_2,c_2,\varOmega_2,\nu_2)}. \end{array} \end{aligned} $$

4.4.1.1 Prior Distribution for the Unknown Mean and Variance

Four parameters must be elicited. The elicitation of μ _l is rather simple. Since μ _l represents the mean, the median, and the mode of the prior probability distribution, the analyst may assess any of these summaries (O’Hagan et al., 2006). A procedure for the elicitation of the degrees of freedom c and ν and the dispersion matrix Ω has been provided by Al-Awadhi and Garthwaite (1998).

Here, suppose a non-informative prior distribution is used:

$$\displaystyle \begin{aligned} \begin{array}{rcl} f({\boldsymbol\theta}_l,W_l)\propto \mid W_l\mid^{-(p+1)/2}. \end{array} \end{aligned} $$

A database is available, with n ₁ measurements for the population of females (p ₁) and n ₂ measurements for the population of males (p ₂). The corresponding posterior distributions (one for the numerator, one for the denominator) can be written as

$$\displaystyle \begin{aligned} \begin{array}{rcl} (\boldsymbol{\theta}_l\mid {\mathbf{z}}_l,\varSigma_l)& \sim &\displaystyle \mathrm{N}(\bar{\mathbf{z}}_l,\varSigma_l/n_l) \end{array} \end{aligned} $$

(4.8)

$$\displaystyle \begin{aligned} \begin{array}{rcl} (\varSigma_l\mid {\mathbf{z}}_l)& \sim&\displaystyle \mathrm{IW}(S_l,n_l-1), \end{array} \end{aligned} $$

(4.9)

where $S_l=\sum _{i=1}^{n_l}({\mathbf {z}}_{li}-\bar {\mathbf {z}}_l)({\mathbf {z}}_{li}-\bar {\mathbf {z}}_l)'$ is the sum of the squares about the sample mean and $\bar {\mathbf {z}}_l=\sum _{j=1}^{n_l}{\mathbf {z}}_{lj}/n_l$.

The marginal likelihood $f_{H_l}(\mathbf {y})$ is, therefore, a p-dimensional Student t distribution with n _l − p degrees of freedom, location vector $\bar {\mathbf { z}}_l$, and scale matrix

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} F_l=\frac{(n_l+1)S_l}{n_l(n_l-p)}, \end{array} \end{aligned} $$

(4.10)

so that $(\mathbf {y}\mid \bar {\mathbf {z}}_l,F_l,n_l-p)\sim t_{n_l-p}(\bar {\mathbf {z}}_l,F_l)$.

Example 4.7 (Sex Discrimination for Skeletal Remains Using Multivariate Data)

Skeletal remains are recovered, and the obturator foramen area is measured. The measurements of the first three pairs of Fourier descriptors are as follows:

First harmonic	Amplitude	0.083095
	Phase	2.6527709
Second harmonic	Amplitude	0.932333
	Phase	0.4530559
Third harmonic	Amplitude	0.413736
	Phase	0.3174581

Suppose that two databases of dimensions (n ₁ × p) = (51 × 6) and (n ₂ × p) = (50 × 6) are available for the population of women and men, respectively. These two databases can be used to obtain the summaries $\bar {\mathbf {z}}_1$, $\bar {\mathbf {z}}_2$ (i.e., the location vectors) and S ₁, S ₂ (i.e., the sum of the squares about the sample means) that are needed to calculate the marginal probability densities of the available measurements under the competing propositions. The location vectors z ₁ and z ₂ and the sum of the squares about the sample means S ₁ and S ₂ can be obtained straightforwardly as

where population is a database of dimension (n × p) containing the available data. Note that only summaries z ₁, z ₂, S ₁, S ₂, as well as the vector of measurements y are available in the database skeletal.Rdata and can be obtained as

The marginal density $f_{H_1}(\mathbf {y})$ in the numerator of the Bayes factor is a p-dimensional Student t distribution with n ₁ − p = 45 degrees of freedom, location m1 as above, and scale matrix

The marginal density $f_{H_2}(\mathbf {y})$ in the denominator of the Bayes factor is a p-dimensional Student t distribution with n ₂ − p = 44 degrees of freedom, location m2 as above, and scale matrix

The density of a multivariate Student t distributed random variable can be calculated using the function dmvt available in the package LaplacesDemon (Hall et al., 2020).

The Bayes factor represents strong support for the proposition according to which the skeletal remains originate from a woman (population p ₁) rather than from a man (population p ₂).

As discussed in Sect. 3.4.2, it is important to study the performance of the proposed model. This can be achieved by using the available databases to generate many test cases and computing relevant performance metrics.

4.4.1.2 Classification as a Decision

The BF obtained in Example 4.7 supports proposition H ₁ over H ₂. However, if a decision is to be made, one needs to take into account the prior uncertainty (in terms of probabilities) about the competing propositions and the undesirability (in terms of losses) of adverse outcomes (i.e., classification errors).

Let π ₁ and π ₂ denote the prior probabilities of propositions H ₁ and H ₂. The posterior probabilities α ₁ and α ₂ can be easily calculated as

$$\displaystyle \begin{aligned} \begin{array}{rcl} \alpha_l=\frac{\pi_l f(\mathbf{y}\mid\boldsymbol\mu_l,c_l,\varOmega_l,\nu_l)}{\sum_{j=1}^{2}\pi_j f(\mathbf{ y}\mid\boldsymbol\mu_j,c_j,\varOmega_j,\nu_j)}, \end{array} \end{aligned} $$

where the marginals f(y∣μ _j, c _j, Ω _j, ν _j), l = 1, 2, are as in (4.7).

A criterion that can be used to classify the recovered item into one of the two populations has been outlined in Sect. 1.9. When using a “0 − l _i” loss function (Table 1.4), the Bayes decision criterion states that the decision d ₁, classifying the recovered item in the population of females (p ₁), is optimal whenever

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} \mathrm{BF }>\frac{l_1/l_2}{\pi_1/\pi_2}=c. \end{array} \end{aligned} $$

(4.11)

Example 4.8 (Sex Discrimination for Skeletal Remains Using Multivariate Data—Continued)

If the prior odds are 1, and a symmetric loss function is chosen (i.e., l ₁ = l ₂), the criterion in (4.11) says that the decision d ₁ is optimal whenever BF > 1.

Assuming equal prior probabilities may be unrealistic because, often, there is at least some information to help assert whether one proposition is more probable than the stated alternative proposition. Likewise, the decision maker’s preferences among adverse outcomes may not properly be reflected by a symmetric loss function, though it should be noted that what actually matters is only the ratio of l ₁ to l ₂.

To investigate the effect of alternative choices for the prior odds and the loss function, one can conduct a sensitivity analysis . Figure 4.4 shows an example for the threshold c in (4.11) as a function of increasing values of the prior probability π ₁ and for different asymmetric loss functions , where l ₂, the loss associated with the adverse outcome of the decision d ₂, is fixed at 1, and l ₁, associated with the adverse outcome of the decision d ₁, is equal to 10, 50, and 100.

This analysis reveals that d ₁ is not the optimal decision for very high values of l ₁, compared to l ₂, and for very small values of the prior probability π ₁.

4.4.2 Two-Level Models

A recurrent problem in forensic practice is to help distinguish between legal and illegal cannabis plants (Bozza et al., 2014). Cannabis seedlings can be discriminated, to some extent, on the basis of their chemical profiles using chemometric tools and a methodology as described in Broséus et al. (2010). This study focused on several target compounds, taking into account their presence in drug type (illegal) and fiber type (legal) Cannabis.

Suppose a dataset is available that consists of replicate measurements (n) made on illegal plants (population p ₁) and on fiber type plants (population p ₂). The sample size is equal to m ₁ and m ₂ for populations p ₁ and p ₂, respectively. Background data can be denoted by z _lij = (z _lij1, …, z _lijp), where l = 1, 2, i = 1, …, m _l, j = 1, …, n, and p is the number of variables. Available data suggest that a statistical model with two levels of variation is suitable: variation between replicate measurements from the same source and variation between measurements from different sources.

4.4.2.1 Normal Distribution for the Between-Source Variation

Here we use the two-level random effect model described in Sect. 3.4.1.1. For the within-source variation, the distribution of Z _lij is taken to be normal , Z _lij ∼N(θ _li, W _l). For the between-source variation, denote the mean vector between sources by μ _l, and the matrix of between-source variances and covariances by B _l. The distribution of θ _li is taken to be normal, θ _li ∼N(μ _l, B _l).

Measurements are available on some seized material, denoted by y = (y ₁, …, y _n), where y _j = (y _j1, …, y _jp), j = 1, …, n. A laboratory is asked to help determine the plant’s chemotype. The following propositions may be of interest:

H ₁ ::: The seized plant is drug type Cannabis (population p ₁).
H ₂ ::: The seized plant is fiber type Cannabis (population p ₂).

The probability distribution of the measurements on items from each population is taken to be normal, Y ∼N(θ _l, B _l), l = 1, 2. The marginal probability densities in the numerator and denominator have the form $f_{H_l}(\mathbf y)=f_l(\mathbf {y}\mid \boldsymbol \mu _l,W_l,B_l)$, l = 1, 2, and can be obtained as in (3.28)

$$\displaystyle \begin{aligned} &f_l(\mathbf{y}\mid \boldsymbol\mu_l,W_l,B_l)=\mid 2\pi W_l\mid^{-n/2}\mid 2\pi B_l\mid^{-1/2} \mid 2\pi (nW_l^{-1}+B_l^{-1})^{-1}\mid^{1/2}\\ &\qquad \times\exp\left\{-\frac 1 2 \left[(\bar{\mathbf{y}}-\boldsymbol\mu_l)'(n^{-1}W_l+B_l)^{-1}(\bar{\mathbf{y}}-\boldsymbol\mu_l) +\text{tr}\left(SW_l^{-1}\right)\right]\right\}, \end{aligned} $$

(4.12)

where $S=\sum _{i=1}^{n}({\mathbf {y}}_i-\bar {\mathbf {y}})({\mathbf {y}}_i-\bar {\mathbf {y}})'$.

The Bayes factor can then be obtained as in (1.26) as a ratio between the two marginals

$$\displaystyle \begin{aligned} \mathrm{BF }&=\frac{f_{H_1}(\mathbf{y})}{f_{H_2}(\mathbf{y})}={\frac{f_1(\mathbf{y}\mid ,\boldsymbol{\mu}_1,W_1,B_1)}{f_2(\mathbf{y}\mid \boldsymbol{\mu}_2,W_2,B_2)}} \\ &=\left(\frac{| W_1|}{|W_2|}\right)^{-\frac n 2}\left(\frac{|B_1|}{|B_2|}\right)^{-\frac 1 2}\left(\frac{|\left(nW_1^{-1}+B_1^{-1}\right)^{-1}|}{|\left(nW_2^{-1}+B_2^{-1}\right)^{-1}|}\right)^{\frac 1 2}\\ &\quad \times\exp\left\{\sum_{i=1}^{2}(-1)^i\frac 1 2\left[ \text{tr}(SW_i)^{-1}+\left(\bar{\mathbf y}-\boldsymbol\mu_i\right)'\left(n^{-1}W_i+B_i\right)^{-1}\left(\bar{\mathbf y}-\boldsymbol\mu_i\right)\right]\right\}. \end{aligned} $$

(4.13)

The overall means μ ₁ and μ ₂, the within-source covariance matrices W ₁ and W ₂, and the between-source covariance matrices B ₁ and B ₂ can be estimated from the available background data using (3.32), (3.33), and (3.34).

Example 4.9 (Cannabis Seedlings)

A plant of unknown type is analyzed, and the chemical profile is extracted. Three replicate measurements are taken (n = 3) on three variables (p = 3): Cannabidiol (CBD), D9-Tetrahydrocannabinol (THC), and Cannabinol (CBN). Measurements on the item of unknown type are as follows:

CBD	THC	CBN
− 1.3040	0.2310	0.6874
−1.2918	0.2400	0.7350
−1.0719	0.3176	0.9113

The mean vectors between sources μ, the within-source covariance matrices W, and the between-source covariance matrices B can be estimated from the available background data (Bozza et al., 2014).

The estimates of the overall means μ ₁ and μ ₂ of the within-source covariance matrices W ₁ and W ₂ and of the between-source covariance matrices B ₁ and B ₂ are available in the database plant.Rdata and can be obtained as

These estimates can be obtained using the function two.level.mv.WB introduced in Sect. 3.4.1.1

where population is a data frame with the available data, variables indicates the columns where variables are displayed, and grouping.object indicates the item number.

Given the available measurements, the Bayes factor can be calculated as in (4.13) using the function two.level.mvn.inv.BF.

The Bayes factor represents very strong support for the proposition according to which the seized plant is of drug type rather than fiber type.

4.4.2.2 Non-normal Distribution for the Between-Source Variation

As noted in Sect. 3.4.1.2, whenever the assumption of normality for the between-source variability is considered inappropriate, the normal distribution f(θ _li∣μ _l, B _l) = N(μ _l, B _l) previously proposed can be replaced by a kernel density estimate as in (3.35). The marginal densities $f_{H_l}(y)$ at the numerator and denominator of the Bayes factor become

$$\displaystyle \begin{aligned} f_l(\bar{\mathbf{y}}\mid W_l,B_l,h_l)&=(2\pi)^{-p}\mid B_l\mid^{-1}(m_l h^2_l)^{-2}\mid D_l\mid^{-1/2}\mid D_l^{-1}+(h^2_lB_l)^{-1}\mid^{-1/2}\\ &\times\sum_{i=1}^{m_l}\exp\left\{-\frac 1 2 (\bar{\mathbf{y}}-\bar{\mathbf{z}}_{li})'(D_l+h_l^2B_l)^{-1}(\bar{\mathbf{y}}-\bar{\mathbf{z}}_{li})\right\}, \end{aligned} $$

(4.14)

where D _l = n ⁻¹ W _l. Note that this is just the marginal density of the recovered data, that is, the first line in (3.38), with all multiplicative constants.

The Bayes factor is then given by the ratio of the marginal probability densities in (4.14) for l = 1, 2, that is,

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} \mathrm{BF }=\frac{f_1(\bar{\mathbf{y}}\mid W_1,B_1,h_1)}{f_2(\bar{\mathbf{y}}\mid W_2,B_2,h_2)}. \end{array} \end{aligned} $$

(4.15)

Example 4.10 (Cannabis Seedlings—Continued)

Consider again the case examined in Example 4.9, and suppose that a kernel distribution is used to model the between-source variability. First, the group means $\bar {\mathbf z}_{li}$ must be obtained. They can be obtained as an output of the function two.level.mv.WB that can be used to estimate the model parameters.

Here we show only the first six rows of the (m _l × p) matrices, where each row represents the vector of means $\bar {\mathbf z}_{li}=\frac 1 n \sum _{j=1}^{n}{\mathbf z}_{lij}$, l = 1, 2. Note that the group means $\bar {\mathbf {z}}_1$ and $\bar {\mathbf {z}}_2$, as well as all the estimated parameters (μ ₁, μ ₂, W ₁, W ₂, B ₁ and B ₂) are available in the database plant.Rdata.

The smoothing parameters h ₁ and h ₂ in the two populations can be estimated as in (3.36), using the function hopt:

Given the available measurements, the Bayes factor can be calculated as in (4.15) using the function two.level.mvk.inv.BF available in the supplementary materials available on the book’s website

The Bayes factor represents moderate support for the proposition according to which the seized plant is drug type Cannabis rather than fiber type Cannabis.

4.4.2.3 Assessing Model Performance

One way to investigate the performance of the two models described in Sects. 4.4.2.1 and 4.4.2.2, denoted here Model 1 and Model 2, is to calculate a Bayes factor for all available measurements on items from population 1 (drug type). One would expect to obtain BFs greater than 1 (see Table 4.1). Clearly, one should also consider BF computations for all measurements on items from population p ₂ (fiber type). In the latter case, BFs smaller than 1 would be expected (see Table 4.2).

Table 4.1 Bayes factor values for items of population 1 (Example 4.9 and 4.10) obtained using (4.13) (Method 1) and (4.15) (Method 2)

Full size table

Table 4.2 Bayes factor values for items of population 2 (Example 4.9 and 4.10) obtained using (4.13) (Method 1) and (4.15) (Method 2)

Full size table

4.5 Summary of R Functions

The R functions outlined below have been used in this chapter.

Functions Available in the Base Package

apply: Applies a function to the margins (either rows or columns) of a matrix.
colMeans: Forms column means for numeric arrays (or data frames).
d<name of distribution> (e.g., dnorm): Calculates the density for many parametric distributions.
More details can be found in the Help menu, helpstart.

Functions Available in Other Packages

dbbinom and ddirmnom in the package extraDistr: Calculate the density of a beta-binomial distribution and that of a Dirichlet-multinomial distribution, respectively.
dstp and dmvt in the package LaplacesDemon: Calculate the density of a non-central Student t distribution and of a non-central multivariate Student t distribution, respectively.
fitdist and fitDirichlet in the package SHELF: Fit a parametric distribution starting from elicited probabilities and a Dirichlet distribution from the elicited beta distributions for a set of proportions, respectively.

Functions Developed in the Chapter

beta_prior: Calculates the hyperparameters α and β of a beta distribution Be(α, β) starting from the prior mean m and the prior variance v.
Usage: beta_prior(m,v).
Arguments: m, the prior mean; v, the prior variance.
Output: A vector of values, the first is α, the second is β.
hopt: Calculates the estimates $\hat h$ of the smoothing parameter h.
Usage: hopt(p,m).
Arguments: p, the number of variables; m, the number of sources.
Output: A scalar value.
kn1: Computes the kernel density estimation (numerator).
Usage: kn1(x,pop1,sk1).
Arguments: x, a vector of available measurements; pop1, a vector of measurements of drug intensities on banknotes from drug trafficking where the kernel is centered; sk1, the variance $h_1^2s_1^2$ of the kernel, where h ₁ is the smoothing parameter and $s_1^2$ is the sample variance of the available measurements.
Output: A scalar value.
post_distr: Computes the posterior distribution $\mathrm {N}(\mu _x,\tau ^2_x)$ of a normal mean θ, with X ∼N(θ, σ ²) and θ ∼N(μ, τ ²).
Usage: post_distr(sigma,n,barx,pm,pv).
Arguments: sigma, the variance σ ² of the observations; n, the number of observations; barx, the sample mean $\bar x$ of the observations. pm, the mean μ of the prior distribution N(μ, τ ²); pv, the variance τ ² of the prior distribution N(μ, τ ²).
Output: A vector of two values, the first is the posterior mean μ _x, the second is the posterior variance $\tau ^2_x$.
two.level.mv.WB: Computes the estimate of the overall mean μ, the group means $\bar {\mathbf {z}}_{i}$, the within-group covariance matrix W, and the between-group covariance matrix B.
Usage: two.level.mv.WB(population, variables,grouping .variable)
Arguments: population, a data frame with N rows and k columns collecting measurements on m sources with n _i items for each source, i = 1, …, m; variables, a vector containing the column indices of the variables to be used; grouping.variable, a scalar specifying the variable that is to be used as the grouping factor.
Output: The group means $\bar {\mathbf z}_i$, the estimated overall mean $\hat {\boldsymbol \mu }$, the estimated within-group covariance matrix $\hat W$, the estimated between-group covariance matrix $\hat B$.
two.level.mvn.inv.BF: Computes the BF for investigative purposes from a two-level model where both the within-source variability and the between-source variability are normally distributed.
Usage: two.level.mvn.inv.BF(y,W1,W2,B1,B2,mu1,mu2, vari ables).
Arguments: y, a (n × p) matrix of measurements; W ₁ and W ₂, the within-source covariance matrices; B ₁ and B ₂, the between-source covariance matrices; the overall group means μ ₁ and μ ₂; variables, a vector containing the column indices of the variables to be used.
Output: A scalar value.
two.level.mvk.inv.BF: Computes the BF for investigative purposes from a two-level model where the within-source variability is assumed to be normally distributed, while the between-source variability is modeled by a kernel density.
Usage: two.level.mvk.inv.BF(y,gmu1,gmu2,W1,W2,B1,B2,h1,h2).
Arguments: y, a (n × p) matrix of measurements; gmu1 and gmu2, the group means $\bar {\mathbf {z}}_{1i}$ and $\bar {\mathbf {z}}_{2i}$; W ₁ and W ₂, the within-source covariance matrices; B ₁ and B ₂, the between-source covariance matrices; h ₁ and h ₂, the smoothing parameters h ₁ and h ₂.
Output: A scalar value.

Published with the support of the Swiss National Science Foundation (Grant no. 10BP12_208532/1).

Notes

1.
Note that a conjugate prior distribution may not always be the best choice. A method for assessing a non-conjugate prior distribution where the vector mean and the covariance matrix of the multivariate normal are, a priori, independent is provided by Garthwaite and Al-Awadhi (2001).

References

Al-Awadhi, S. A., & Garthwaite, P. H. (1998). An elictiation method for multivariate normal distributions. Communications in Statistics - Theory and Methods, 27, 1123–1142.
Article MathSciNet MATH Google Scholar
Benazzi, S., Maestri, C., Parisini, S., Vecchi, F., & Gruppioni, G. (2009). Sex assessment from the sacral base by means of image processing. Journal of Forensic Sciences, 54, 249–254.
Article Google Scholar
Besson, L. (2004). Détection des stupéfiants par IMS. Technical report, Institut de police scientifique, Université de Lausanne.
Google Scholar
Bierry, G., Le Minor, J. M., & Schmittbuhl, M. (2010). Oval in males and triangular in females? A quantitative evaluation of sexual dimorphism in the human obturator foramen. American Journal of Phisical Anthropology, 141, 626–631.
Google Scholar
Bozza, S., Broséus, J., Esseiva, P., & Taroni, F. (2014). Bayesian classification criterion for forensic multivariate data. Forensic Science International, 244, 295–301.
Article Google Scholar
Broséus, J., Anglada, F., & Esseiva, P. (2010). The differentiation of fibre- and drug type cannabis seedling by gas chromatography/mass spectrometry and chemometric tools. Forensic Science International, 200, 87–92.
Article Google Scholar
Brozek-Mucha, Z., & Jankowicz, A. (2001). Evaluation of the possibility of differentiation between various types of ammunition by means of GSR examination with SEM-EDX method. Forensic Science International, 123, 39–47.
Article Google Scholar
Garthwaite, P. H., & Al-Awadhi, S. A. (2001). Non-conjugate prior distribution assessment for multivariate normal sampling. Journal of the Royal Statistical Society B, 63, 95–110.
Article MathSciNet MATH Google Scholar
Goldmann, T., Taroni, F., & Margot, P. (2004). Analysis of dyes in illicit pills (amphetamine and derivates). Journal of Forensic Sciences, 49, 716–722.
Article Google Scholar
Hall, B., Hall, M., Statisticat, L., Brown, E., Hermanson, R., Charpentier, E., Heck, D., Laurent, S., Gronau, S. L., & Singmann, H. (2020). Package ‘LaplacesDemon’. https://cran.r-project.org/web/packages/LaplacesDemon/LaplacesDemon.pdf
Google Scholar
Kamath, S., Charles Stephen, J. K., Suresh, S., Barai, B. K., Sahoo, A., Radhika Reddy, K., & Bhattacharya, K. R. (2008). Basmati rice: Its characteristics and identification. Journal of the Science of Food and Agricolture, 88, 1821–1831.
Article Google Scholar
Oakley, J. (2008). Package ‘SHELF’. https://cran.r-project.org/web/packages/SHELF/SHELF.pdf
Google Scholar
O’Hagan, A., & Kendall, M. (1994). Kendall’s advanced theory of statistics: Bayesian inference. Volume 2B. Number v. 2, pt. 2 in Kendall, Maurice George. Kendall’s Advanced Theory of Statistics. Edward Arnold.
Google Scholar
O’Hagan, A., Buck, C. E., Daneshkhah, A., Eiser, J. R., Garthwaite, P. H., Jenkinson, D. J., Oakley, J. E., & Rakow, T. (2006). Uncertain judgements: Eliciting experts’ probabilities. Hoboken: Wiley.
Book MATH Google Scholar
Wilson, A., Aitken, C., Sleeman, R., & Carter, J. (2014). The evaluation of evidence relating to traces of cocaine on banknotes. Forensic Science International, 236, 67–76.
Article Google Scholar
Wolodzko, T. (2020). Package ‘extraDistr’. https://cran.r-project.org/web/packages/extraDistr/extraDistr.pdf
Google Scholar
Zapata-Vazquez, R., O’Hagan, A., & Bastos, L. (2014). Eliciting expert judgements about a set of proportions. Journal of Applied Statistics, 41, 1919–1933.
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Economics, Ca’ Foscari University of Venice, Venice, Italy
Silvia Bozza
Faculty of Law, Criminal Justice and Public Administration, School of Criminal Justice, University of Lausanne, Lausanne-Dorigny, Switzerland
Silvia Bozza, Franco Taroni & Alex Biedermann

Authors

Silvia Bozza
View author publications
You can also search for this author in PubMed Google Scholar
Franco Taroni
View author publications
You can also search for this author in PubMed Google Scholar
Alex Biedermann
View author publications
You can also search for this author in PubMed Google Scholar

4.1 Electronic Supplementary Material

(ZIP 79 kb)

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Bozza, S., Taroni, F., Biedermann, A. (2022). Bayes Factor for Investigative Purposes. In: Bayes Factors for Forensic Decision Analyses with R. Springer Texts in Statistics. Springer, Cham. https://doi.org/10.1007/978-3-031-09839-0_4

Download citation

DOI: https://doi.org/10.1007/978-3-031-09839-0_4
Published: 08 June 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-09838-3
Online ISBN: 978-3-031-09839-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics