A new Bayesian discrepancy measure

Bertolino, Francesco; Manca, Mara; Musio, Monica; Racugno, Walter; Ventura, Laura

doi:10.1007/s10260-024-00745-1

A new Bayesian discrepancy measure

Original Paper
Open access
Published: 23 February 2024

Volume 33, pages 381–405, (2024)
Cite this article

Download PDF

You have full access to this open access article

Statistical Methods & Applications Aims and scope Submit manuscript

A new Bayesian discrepancy measure

Download PDF

Francesco Bertolino¹,
Mara Manca¹,
Monica Musio ORCID: orcid.org/0000-0003-4478-1737¹,
Walter Racugno¹ &
…
Laura Ventura²

598 Accesses
Explore all metrics

Abstract

The aim of this article is to make a contribution to the Bayesian procedure of testing precise hypotheses for parametric models. For this purpose, we define the Bayesian Discrepancy Measure that allows one to evaluate the suitability of a given hypothesis with respect to the available information (prior law and data). To summarise this information, the posterior median is employed, allowing a simple assessment of the discrepancy with a fixed hypothesis. The Bayesian Discrepancy Measure assesses the compatibility of a single hypothesis with the observed data, as opposed to the more common comparative approach where a hypothesis is rejected in favour of a competing hypothesis. The proposed measure of evidence has properties of consistency and invariance. After presenting the definition of the measure for a parameter of interest, both in the absence and in the presence of nuisance parameters, we illustrate some examples showing its conceptual and interpretative simplicity. Finally, we compare a test procedure based on the Bayesian Discrepancy Measure, with the Full Bayesian Significance Test, a well-known Bayesian testing procedure for sharp hypotheses.

With Bayesian estimation one can get all that Bayes factors offer, and more

Article Open access 09 September 2022

On the prevalence of information inconsistency in normal linear models

Article Open access 20 February 2020

Introduction to Bayesian Statistical Inference

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

D. V. Lindley in Lindley (1965) (preface page xi) stated that

“ ...hypothesis testing looms large in standard statistical practice, yet scarcely appears as such in the Bayesian literature.”

Since then things have changed and, in the last sixty years, there have been several attempts to build a measure of evidence that covers, in a Bayesian context, the role that the p-value has played in the frequentist setting. A prominent example is the decision test based on the Bayes Factor and its extensions (see, for instance, Berger (1985)).

As an alternative to the Bayes Factor, another Bayesian evidence measure is provided in Pereira and Stern (1999) upon which the testing procedure Full Bayesian Signicance Test (FBST) is based. For a recent survey on the FBST see Pereira and Stern (2020).

The main aim of this paper is to give a contribution to the testing procedure of precise hypotheses. In particular, the proposed Bayesian measure of evidence, called Bayesian Discrepancy Measure (BDM), gives an absolute evaluation of a hypothesis H in light of prior knowledge about the parameter and observed data. The proposed measure of evidence has the desired properties of invariance under reparametrization and consistency for large samples.

Our starting point is the idea that a hypothesis may be more or less supported by the available evidence contained in the posterior distribution.

We do not adopt the hypothesis testing approach for which there is no test that can lead to the rejection of a hypothesis except by comparing it with another hypothesis (Neyman-Pearson in the frequentist perspective, Bayes factor in the Bayesian one), but rather the approach proposed by Fisher (see Christensen (2005) and Deni (2004)). Reference is made to a precise hypothesis H and no alternative is considered against such hypothesis. In this view different hypotheses made by several experts can be evaluated and using the information coming from the same data, some can be accepted others not. In this respect, in a broad sense, we can say that we return to Fisher’s original idea of pure significance according to which “Every experiment may be said to exist only in order to give the facts a chance of disproving the null hypothesis” (Fisher 1925).

The proposed measure of evidence can be seen as a Bayesian tool for model checking, that is, as a technique that can aid in the actual specification of a model, without the need to make explicit reference to alternative models or hypotheses. For an extensive discussion of this point and the difference with the procedure of Bayesian model selection see O’Hagan (2003).

The structure of the paper is as follows. In Sect. 2 the definition of the proposed measure is presented for a scalar parameter of interest, both in the absence or presence of nuisance parameters. In Sect. 3 different illustrative examples are discussed, involving one or two independent populations. Finally, in Sect. 4 we make a comparison between the Bayesian Discrepancy Test and the Full Bayesian Significance Test which is based on the e-value, a well-known Bayesian evidence index used to test sharp hypotheses. The last section contains conclusions and directions for further research.

2 The Bayesian discrepancy measure

Let $(\mathcal {X}, \mathcal {P}^{X}_{\varvec{\theta }}, \varvec{\Theta })$ be a parametric statistical model where $X \in \mathcal {X} \subset \mathbb {R}^k$, $\mathcal {P}^{X}_{\varvec{\theta }}=\{f(x \vert \varvec{\theta })\ \vert \ \varvec{\theta } \in \varvec{\Theta }\}$ is a class of probability distributions (Lebesgue integrable) defined on $\mathcal {X}$, depending on an unknown vector of continuous parameters $\varvec{\theta } \in \varvec{\Theta }$, an open subset of $\mathbb {R}^p$. Assume that

(a)
the model is identifiable;
(b)
$f(x \vert \varvec{\theta })$ have support not depending on $\varvec{\theta }$, $\forall \ \varvec{\theta } \in \varvec{\Theta }$;
(c)
the log likelihood function must be at least twice differentiable;
(d)
the operations of integration and differentiation with respect to $\varvec{\theta }$ can be exchanged.

We assume a prior probability density $g_0(\varvec{\theta })$ following Cromwell’s Rule which states that “it is inadvisable to attach probabilites of zero to uncertain events, for if the prior probability is zero so is the posterior, whatever be the data. A probability of one is equally dangerous because then the probability of the complementary event will be zero” (see Section 6.2 in Lindley (1991)). We are then assuming that $g_0(\varvec{\theta }) >0, \, \forall \varvec{\theta }$ for you will need that assumption for claiming consistency (see Proposition 1 of the next Section).

First, we discuss the case of a scalar parameter. Then we discuss the case of a scalar parameter of interest in the presence of nuisance parameters.

2.1 The Bayesian discrepancy measure for a scalar parameter

In this section we assume that $k=p=1$. Given an iid random sample $\varvec{x}=(x_1,\ldots ,x_n)$ from $\mathcal {P}^{X}_{\theta }$, let $L(\theta \vert \varvec{x})$ be the corresponding likelihood function based on data $\varvec{x}$ and let $g_0(\theta )$ be a continuous prior distribution on $\Theta \subseteq \mathbb {R}$. The posterior probability density for $\theta$ given $\varvec{x}$ is then

$$\begin{aligned} g_1(\theta \vert \varvec{x}) \propto g_0(\theta ) \, L(\theta \vert \varvec{x}). \end{aligned}$$

Moreover, given the posterior distribution function $G_1(\theta \vert \varvec{x})$, the posterior median is any real number $m_1$ which satisfies the inequalities $G_1(m_1\vert \varvec{x}) \ge \frac{1}{2}$ and $G_1^-(m_1 \vert \varvec{x}) \le \frac{1}{2}$, where $\displaystyle G_1^-(m_1 \vert \varvec{x})=\lim _{\theta \uparrow m_1} G_1(\theta \vert \varvec{x})$. In the case in which $G_1(\cdot \vert \varvec{x})$ is continuous and strictly increasing we have $m_1= G_1^{-1}(\frac{1}{2} \vert \varvec{x})$. Under the assumptions made in the beginning of Sect. 2, posterior median $m_1$ is uniquely defined.

We are interested in testing the precise hypothesis

$$\begin{aligned} H: \theta = \theta _H. \end{aligned}$$

(1)

In order to measure the discrepancy of the hypothesis (1) w.r.t. the posterior distribution, in the case $\Theta =\mathbb {R}$, we consider the following two intervals:

1.
the discrepancy interval
$$\begin{aligned} I_H = \left\{ \begin{array}{ll} (m_1,\theta _H) &{} \text { if } \quad m_1 < \theta _H \\ \{m_1\} &{} \text { if } \quad m_1 = \theta _H, \\ (\theta _H,m_1) &{} \text { if } \quad m_1 > \theta _H \\ \end{array} \right. \end{aligned}$$
(2)
2.
the external interval
$$\begin{aligned} I_E = \left\{ \begin{array}{ll} (\theta _H,+\infty ) &{} \text { if } \quad m_1< \theta _H \\ (-\infty ,\theta _H) &{} \text { if } \quad \theta _H < m_1. \\ \end{array} \right. \end{aligned}$$
(3)

When $m_1=\theta _H$, the external interval $I_E$ can be $(-\infty , m_1)$ or $(m_1,+\infty )$. Note that, by construction, $\mathbb {P}(I_H \cup I_E)=\frac{1}{2}$ (see Fig. 1). If the support of the posterior is a subset of $\mathbb {R}$, the intervals $I_H$ and $I_E$ can be defined consequently.

Definition 1

Given the posterior distribution function $G_1(\theta \vert \varvec{x})$, we define the Bayesian Discrepancy Measure of the hypothesis H as

$$\begin{aligned} \delta _H = 2 \, \mathbb {P}(\theta \in I_H \vert \varvec{x}) = 2 \int _{I_H} d G_1(\theta \vert \varvec{x}). \end{aligned}$$

(4)

The measure can be also computed by means of the external interval as

$$\begin{aligned} \delta _H = 1 - 2 \, \mathbb {P}(\theta \in I_E \vert \varvec{x}) = 1 - 2 \int _{I_E} d G_1(\theta \vert \varvec{x}), \end{aligned}$$

(5)

which can also be written as

$$\begin{aligned} \delta _H = 1 - 2 \min \{G_1^{-}(\theta _H \vert \varvec{x}),1-G_1(\theta _H \vert \varvec{x})\}, \end{aligned}$$

(6)

where $\displaystyle G_1^-(\theta _H \vert \varvec{x})=\lim _{\theta \uparrow \theta _H} G_1(\theta \vert \varvec{x})$. In our case, since $G_1(\theta _H \vert \varvec{x})$ is continuous, this simplifies to

$$\begin{aligned} \delta _H = 1 - 2 \min \{G_1(\theta _H \vert \varvec{x}),1-G_1(\theta _H \vert \varvec{x})\}. \end{aligned}$$

(7)

Formulations (6) and (7) have the advantage of not involving the posterior median in the integral computation. Furthermore, one can interpret the quantity $\min \{G_1(\theta _H \vert \varvec{x}),1-G_1(\theta _H \vert \varvec{x})\}$ as the posterior probability of a “tail" event concerning only the precise hypothesis H. Doubling this “tail" probability, related to the precise hypothesis H, one gets a posterior probability assessment about how “central" the hypothesis H is and hence how it is supported by the prior and the data.

It is important to highlight that the hypothesis H induces the following partition

$$\begin{aligned} \big \{ \Theta _a = (-\infty , \theta _H),\ \Theta _H = \{\theta _H\},\ \Theta _b = (\theta _H, \infty )\big \} \end{aligned}$$

(8)

of the parameter space $\Theta$. Then formulations (6) and (7) can be equivalently expressed as

$$\begin{aligned} \delta _H = \; 1 - 2 \cdot \min _{a,b} \displaystyle \big \{ {\mathbb {P}}(\theta \in \Theta _a \vert \varvec{x}) \,, \, {\mathbb {P}}(\theta \in \Theta _b \vert \varvec{x}) \big \} \,. \end{aligned}$$

(9)

The last formula can be naturally extended to the case where, besides the scalar parameter of interest, nuisance parameters are also present. This issue will be developed in Sect. 2.2.

The following properties apply to the BDM, for a scalar parameter $\theta$.

Proposition 1

(i)
$\delta _H$ always exists and, by construction, $\delta _H \in [0, 1]$;
(ii)
$\delta _H$ is invariant under invertible monotonic transformations of the parameter $\theta$;
(iii)
if $\theta$ is an a.c. random variable, $\theta ^*$ is the true value of the parameter and $\theta ^*=\theta _H$, then $\delta _{H}$ converges asymptotically to a $Unif(\cdot \vert 0,1)$. Otherwise, if $\theta ^* \ne \theta _H$, then $\delta _{H} \, {\mathop {\rightarrow }\limits ^{\textit{p}}}\, 1$ (consistency property).

Proof

(i):

The first property follows immediately from the fact that in (4) the posterior probability $\mathbb {P}(\theta \in I_H \vert \varvec{x}) \in \Big [0,\frac{1}{2}\Big ]$.

(ii):

Let $\lambda =\lambda (\theta )$ be an invertible monotonic transformation of the parameter $\theta$ and let $K_1(\cdot )$ be the cumulative distribution function of the parameter $\lambda$. We denote with $\lambda _H=\lambda (\theta _H)$ and we notice that $m'_1=\lambda (m_1)$ thanks to the monotonic invariance of the median. Suppose, for simplicity, that $\theta _H>m_1$. Then

$$\begin{aligned} \delta _H=2\ \int _{m_1}^{\theta _H} dG_1(\theta \vert \varvec{x})\ = 2\ \Big \vert \int _{m'_1}^{\lambda _H} dK_1(\lambda \vert \varvec{x})\Big \vert . \end{aligned}$$

Therefore, the invariance of the BDM follows immediately from the invariance of the median under invertible monotonic transformations. Notice that if instead of the median $m_1$ we consider, for example, the posterior mean $E(\theta \vert \varvec{x} )$, which is not invariant under invertible monotonic reparametrizations, the property will not hold in general. Moreover, $E(\theta \vert \varvec{x} )$ for some models may not even exist.

(iii):

We first examine the first part of the statement for which $\theta ^*=\theta _H$. Let $J(\hat{\theta })$ be the observed Fisher information and let $\hat{\theta }$ be the maximum likelihood estimator of $\theta$. Under suitable regularity and technical conditions (see for instance Section 7, p. 129 in Lindley (1965) and Section 5.3.2, p. 287 in Bernardo and Smith (1994)), the asymptotic distribution of the “normalized” random quantity $W = \sqrt{J(\hat{\theta })}(\theta -\hat{\theta })$ is standard normal, both in the posterior, for fixed data and random $\theta$, and in the sampling distribution, for fixed $\theta$ and random data. We have

$$\begin{aligned} \delta _H = 1 - 2\min \{ G_1(\theta _H \vert \varvec{x}), 1- G_1(\theta _H \vert \varvec{x})\}, \end{aligned}$$

(10)

where

$$\begin{aligned} G_1(\theta _H \vert \varvec{x})= P(W \le \sqrt{J(\hat{\theta })} (\theta _H- \hat{\theta }) \mid \varvec{X}= \varvec{x}). \end{aligned}$$

(11)

Since W is asymptotically standard normal, then $G_1(\theta _H \vert \varvec{x})$ is asymptotically $\Phi \left( \sqrt{J(\hat{\theta }}) (\theta _H- \hat{\theta }) \right)$ (a function of the data through $\hat{\theta }$). But also, in the sampling distribution given $\theta ^* = \theta _H$, $\sqrt{J(\hat{\theta })} (\theta _H- \hat{\theta })$ is asymptotically standard normal and thus, in view of the probability integral transform, $G_1(\theta _H \vert \varvec{X})$ is asymptotically uniform on [0, 1] in this sampling distribution. Then

$$\begin{aligned} \mathbb {P}(\delta _H \le t \vert \theta _H) = \mathbb {P} \left( \frac{1}{2}(1-t) \le G_1(\theta _H \vert \varvec{X}) \le \frac{1}{2} (1+t) \vert \theta _H \right) \approx t, \end{aligned}$$

so that $\delta _H$ is asymptotically uniform under $\theta _H$. If, instead, $\theta ^* \ne \theta _H$ and $n \rightarrow \infty$, under suitable regularity conditions (see for instance Section 7, p. 129 in Lindley (1965)) it is well known that $g_1(\theta \vert \varvec{x})$ is concentrated in a neighbourhood whose size is of order $n^{-\frac{1}{2}}$ around $\theta ^*$. Then from equation 5, since the tail event $\theta \in I_E$ will have vanishingly small probability, we have that $\lim _{n \rightarrow \infty } \delta _H=1$.

$\square$

As pointed out before, the further $\theta _H$ is from the posterior median $m_1$ of the distribution function $G_1(\theta \vert \varvec{x})$, the closer $\delta _H$ is to 1. It can then be said that H does not conform to $G_1(\theta \vert \varvec{x})$. On the contrary, the smaller $\delta _H$ the stronger is the evidence in favor of H. Following this idea, we can construct a procedure to evaluate (and possibly reject) the hypothesis H, using the evidence measure $\delta _H$.

Definition 2

The Bayesian Discrepancy Test (BDT) is the procedure for evaluating a hypothesis H based on the Bayesian Discrepancy Measure (BDM).

High values of $\delta _H$ provide strong evidence against the hypothesis H. On the other hand, if $\delta _H$ is small, the data are consistent with H.

Summarizing, when H is true, then, for large n, $\delta _H$ is roughly equally likely to fall anywhere between 0 and 1. By contrast, when H is false, $\delta _H$ is more likely to be near 1 than near 0. As for other measures of evidence (as for the Full Bayesian Significance Test or the frequentist p-value), a threshold could be chosen in order to interpret the observed value of $\delta _H$. However, in the direction recommended in the ASA statement (see Wasserstein and Lazar (2016)) and in view of the debate on hypothesis testing (Benjamin et al. 2018; Benjamin and Berger 2019) and the recent studies about the reproducibility of experiments (Collaboration 2015; Johnson et al. 2017), we agree with Fisher (1973) that “no scientific worker has a fixed level of significance at which from year to year, and in all circumstances, he rejects hypotheses; he rather gives his mind to each particular case in the light of his evidence and his ideas". Given the critical points related to the choice of a threshold, we think it is important to look for an applied measure of evidence that pushes the researcher to think more about the specific problem, and that avoids the use of standard receipes.

2.2 The Bayesian discrepancy measure in presence of nuisance parameters

Suppose that $p \ge 2$ and $k \ge 1$. Let $\varphi = \varphi (\varvec{\theta })$ be a scalar parameter of interest, where $\varphi : \varvec{\Theta } \rightarrow \Phi \subseteq \mathbb {R}$. Let us further consider a bijective reparametrization $\varvec{\theta } \Leftrightarrow (\varphi , \varvec{\zeta })$, where $\varvec{\zeta } \in \varvec{Z} \subseteq \mathbb {R}^{p-1}$ denotes an arbitrary nuisance parameter, which is determined on the basis of analytical convenience (note that the value of the evidence measure is invariant with respect to the choice of the nuisance parameter). We consider hypotheses that can be expressed in the form

$$\begin{aligned} H: \varphi = \varphi _H, \end{aligned}$$

(12)

where $\varphi _H$ is known as it represents the hypothesis that it is of interest to evaluate. The transformation $\varphi$ must be such that, for all $\varvec{\theta } \in \varvec{\Theta }$ and for all $\varphi _H \in \Phi$, it can always be assessed whether $\varphi$ is strictly smaller, strictly larger or equal to $\varphi _H$ (i.e. $\varphi < \varphi _H$ either $\varphi > \varphi _H$, or $\varphi = \varphi _H$). Hypothesis (12) and transformation $\varphi$, with

$$\begin{aligned} \begin{array}{ll} {\varvec{\Theta }}_a &{}= \big \{ \varvec{\theta } \in \varvec{\Theta }: \varphi < \varphi _H \big \}\\ {\varvec{\Theta }}_H &{}= \big \{ \varvec{\theta } \in \varvec{\Theta }: \varphi = \varphi _H \big \}.\\ {\varvec{\Theta }}_b &{}= \big \{ \varvec{\theta } \in \varvec{\Theta }: \varphi > \varphi _H \big \} \end{array} \end{aligned}$$

(13)

We call any hypothesis of type (12), which identify a partition of the form (13), a partitioning hypothesis. It is easy to verify that many commonly used hypotheses are partitioning. In this paper we only consider hypotheses of this nature. In this setting, we express the BDM as

$$\begin{aligned} \begin{array}{ll} \delta _H &{}=\; \displaystyle 1 - 2 \cdot \min _{a,b} \big \{ {\mathbb {P}}(\varvec{\theta } \in \varvec{\Theta }_a \vert \varvec{x}) \,, \, {\mathbb {P}}(\varvec{\theta } \in \varvec{\Theta }_b \vert \varvec{x}) \big \} \\ &{}=\; \displaystyle 1 - 2 \cdot \int _{I_E} g_1(\varvec{\theta } \vert \varvec{x}) \, \textrm{d}\varvec{\theta }, \, \end{array} \end{aligned}$$

(14)

where the external set is given by

$$\begin{aligned} I_E \; = \; \arg \min _{a,b} \big \{ {\mathbb {P}}(\varvec{\theta } \in \varvec{\Theta }_a \vert \varvec{x}) \,, \, {\mathbb {P}}(\varvec{\theta } \in \varvec{\Theta }_b \vert \varvec{x}) \big \} \;. \; \end{aligned}$$

(15)

In the particular scenario where the marginal posterior

$$\begin{aligned} h_1(\varphi \vert \varvec{x}) \; = \; \int _{\varphi ( \varvec{\theta })= \varphi } g_1(\varvec{\theta } \vert \varvec{x}) \textrm{d}\varvec{\theta } \;, \quad \forall \varphi \in \Phi \,, \end{aligned}$$

of the parameter of interest $\varphi$ can be computed in a closed form, the hypothesis (12) can be easily treated using the methodologies seen in Subsection 2.1, i.e. the BDM is computed by means of formula (4) or (5) applied to the marginal.

Properties reported in Proposition 1 naturally extend to the setting we just presented.

3 Illustrative examples

The simplicity of the BDT is highlighted by the following examples, some of which deal with cases not usually considered in the literature. Examples 1 and 2 focus on a scalar parameter of interest, while Examples 3, 4, 5, 6, 7 also contain nuisance parameters.

In all examples we have adopted a Jeffreys’ prior (see Yang and Berger (1996) for a catalog of non-informative priors) for simplicity. However, other objective priors and, in the presence of substantive prior information, informative priors could equally be used.

3.1 Examples of the univariate parameter case

Example 1

Exponential distribution Let $\varvec{x}=(x_1, \dots , x_n)$ be an iid sample of size n from the Exponential distribution $X \sim Exp\big (x \vert \theta ^{-1} \big )$, with $\theta \in {\mathbb {R}}^+.$ We are interested in the hypothesis $H: \theta =\theta _H$. Assuming a Jeffreys’ prior for $\theta$, i.e. $g_0(\theta ) \propto \theta ^{-1}$, the posterior distribution is given by $g_1(\theta \vert \varvec{x}) \propto \theta ^{-n-1} \exp \{- n \bar{x} \cdot \theta ^{-1} \}$, with $\bar{x}$ the sample mean.

Figure 2 shows the posterior density function as well as the discrepancy and the external intervals for $H:\theta = \theta _H = 2.4$ and the MLE $\bar{x} = 1.2$ for three sample sizes [A] $n =6$, [B] $n = 12$, [C] $n = 24$. In [A] we have a posterior median $m_1=1.27$ and $\delta _H =0.832$, while in [B] $m_1=1.23$ and $\delta _H =0.960$, in [C] $m_1=1.22$ and $\delta _H =0.997$.

While in case [A] the data do not contradict H sufficiently, in case [B] there is a weak evidence against H, which becomes stringer in [C].

Note that in all scenarios considered, we find the following relation between $\delta _H$ and the p-value,

$$\begin{aligned} {\textit{p-value}}=1 - \delta _H \end{aligned}$$

(16)

(in [A] $\delta _H = 0.832$ and p-value$= 0.168$, in [B] $\delta _H = 0.96$ and p-value$= 0.04$, while in [C] $\delta _H = 0.997$ and p-value$= 0.003$). This result depends clearly on the use of the Jeffreys’ prior, which is a matching prior for a scalar parameter (see Ruli and Ventura (2021)).

Remark 1

The fact that classical and Bayesian procedures, under certain conditions, produce the same conclusions is well known (see, for instance, Lindley (1965)). The linear relationship (16) also occurs in other simple cases.

Finally, to conclude Example 1, it is useful to show the trend of the BDM when varying $n=1,2,\dots ,25$ for six values of the MLE: $(a)\ 0.8$, $(b)\ 1.2$, $(c)\ 1.6$ (case [A]) and $(d)\ 4.0$, $(e)\ 3.6$, $(f)\ 3.2$ (case [B]), see Fig. 3. In order to explain the difference between the BDM trends in cases [A] and [B], consider that:

(i)
in case [A] the posterior median $m_1 < \theta _H = 2.4$, whereas in case [B] $m_1 > \theta _H = 2.4$;
(ii)
$\delta _H$ is monotonically increasing, both with respect to n, and with respect to the distance $\vert m_1 - \theta _H \vert$;
(iii)
the posterior $g_1$ always has a positive asymmetry, which decreases as n increases;
(iv)
the trend difference of the BDM in cases [A] and [B] depends on the fact that the posterior $g_1$ has ‘small’ tails on the left-hand side of $m_1$ and ‘large’ tails on the right-hand side.

Moving forward in the discussion, in order to highlight the evaluative nature of the BDT, it is worth pointing out that it allows the separate and simultaneous testing of $\ell \ge 2$ hypotheses

$$\begin{aligned} H_j: \, \varphi = \varphi _j, \quad j=1,2,\dots , \ell , \end{aligned}$$

(17)

as shown in Example 2. Remember that with the comparative approach, among the $\ell$ competing hypotheses, only one is accepted. On the contrary, under the evaluative approach, it may happen that several hypotheses are supported by the data, or even that all hypotheses must be rejected.

Example 2

- Evaluation of some hypotheses made by several experts (Bernoulli distribution) In the 1700 s, several hypotheses $H_j: \theta = \theta _j$ were formulated about the birth masculinity rate $\theta =\frac{M}{M+F}$. Among them we consider $\theta _1 = \frac{1}{2}$ (J. Bernoulli), $\theta _2 = \frac{13}{25}$ (J. Arbuthnot), $\theta _3 = \frac{1050}{2050}$ (J. P. Süssmilch), $\theta _4 = \frac{23}{45}$ (P. S. Laplace). We assume that the gender of each newborn is modeled as a $Bin(\cdot \vert 1, \theta )$. Then, using data recorded in 1710 in London (see, for instance, Spiegelhalter (2019)), with 7640 males and 7288 females (the MLE is $\hat{\theta } = 0.512$) and assuming the Jeffreys’ prior $Beta(\theta \vert 1, 1)$, we compute $\delta _{H_j}$ using the Normal asymptotic approximation

$$\begin{aligned} \delta _{H_j} \cong 1 - 2\cdot \displaystyle \int _{I_E^j} \tilde{g}_1\big (\theta \vert \hat{\theta }, \textstyle {\frac{1}{n}}\hat{\theta }(1-\hat{\theta }) \big ) \textrm{d}\theta , \quad j=1,2,3,4, \end{aligned}$$

with $\tilde{g}_1$ the Normal distribution. Since $\delta _{H_1}=0.996$, $\delta _{H_2}=0.955$, $\delta _{H_3}=0.079$, $\delta _{H_4}=0.132$, we can conclude that there is sufficient evidence against the first two hypotheses, while there is not enough evidence agains the hypotheses made by Süssmilch and Laplace.

3.2 Examples of the more general case

The examples presented hereafter, can be distinguished by tests concerning a parameter or a parametric function of a single population, and tests concerning the comparison of two independent population parameters.

3.2.1 Tests involving a single population

Example 3

- Test on the shape parameter, mean and variance of the Gamma distribution Let ${\varvec{x}} = (x_1,\dots ,x_n)$ be an iid sample of size n from $X \sim Gamma\big ( x \vert \alpha , \beta \big )$, $(\alpha , \beta ) \in {\mathbb {R}}^+ \times {\mathbb {R}}^+$. We denote by $m_g$ the geometric mean of $\varvec{x}$. The likelihood function for $(\alpha ,\beta )$ is given by

$$\begin{aligned}L(\alpha , \beta \vert \varvec{x}) \; \propto \; \left( \displaystyle {\frac{\beta ^\alpha }{\Gamma (\alpha )}} \cdot m_g^\alpha \cdot e^{- \bar{x}\cdot \beta } \right) ^n . \end{aligned}$$

For the fictitious data ${\varvec{x}} = (0.8, 1.1, 1.2, 1.4, 1.8, 2, 4, 5, 8)$, we find that the MLEs are $\hat{\alpha } = 1.921$ and $\hat{\beta } = 0.7572$.

We are interested in testing the hypotheses [A] $H_A: \alpha = \alpha _H$, with $\alpha _H = 2.5$, [B] $H_B: \mu = \mu _H$, with $\mu _H = 6$, and [C] $H_C: \sigma ^2 = \sigma ^2_H$, with $\sigma ^2_H = 2$, where $\mu = \displaystyle {\frac{\alpha }{\beta }}$ and $\sigma ^2 = \displaystyle {\frac{\alpha }{\beta ^2}}$ denote the mean and the variance of X.

We suppose that the parameters $\alpha \text { and } \beta$ are independent and we assume the Jeffreys’ prior for them (see Yang and Berger (1996)), i.e. $g_0 ( \alpha , \beta ) \; = \; g_0^\alpha ( \alpha ) \cdot g_0^\beta ( \beta )$ where $g_0^\alpha ( \alpha )\propto \sqrt{\alpha \cdot \psi ^{(1)} (\alpha ) - 1}$, $g_0^\beta ( \beta ) \propto {\frac{1}{\beta }}$, and $\psi ^{(1)} (\alpha ) = \sum _{j=0}^\infty (\alpha + j)^{-2}$ denotes the digamma function. Then, the posterior for $(\alpha , \beta )$ is given by $g_1(\alpha , \beta \mid {\varvec{x}}) \; = \; k\cdot g_0^\alpha ( \alpha ) \cdot g_0^\beta ( \beta ) \cdot L(\alpha , \beta \vert \varvec{x}),$ with normalizing constant k.

Case [A] The hypothesis $H_A$ identifies the vertical straight line of equation $\alpha = \alpha _H$ and two subsets ${\varvec{\Theta }}_a = \{ (\alpha , \beta ): \alpha < \alpha _H \}$ and ${\varvec{\Theta }}_b = \{ (\alpha , \beta ): \alpha > \alpha _H \}$ (see Fig. 4 [A]). Then we can compute
$$\begin{aligned} {\mathbb {P}} \big ( (\alpha , \beta ) \in {\varvec{\Theta }}_b\ \vert \ \varvec{x}\big )= & {} \displaystyle \int _{\alpha _H}^\infty \int _0^\infty g_1(\alpha , \beta \mid {\varvec{x}}) \, \textrm{d} \beta \, \textrm{d}\alpha \\= & {} \displaystyle k \cdot \int _{\alpha _H}^\infty \int _0^\infty \sqrt{\alpha \cdot \psi ^{(1)} (\alpha ) - 1} \cdot {\frac{1}{\beta }} \left( \displaystyle {\frac{\beta ^\alpha }{\Gamma (\alpha )}} \cdot m_g^\alpha \cdot e^{- \bar{x}\cdot \beta } \right) ^n \, \textrm{d} \beta \, \textrm{d}\alpha \\= & {} \displaystyle \displaystyle k \cdot \int _{\alpha _H}^\infty \sqrt{\alpha \cdot \psi ^{(1)} (\alpha ) - 1} \cdot {\frac{\Gamma (n\alpha )}{\Gamma (\alpha )^n}} \cdot \left( {\frac{m_g}{n\, \bar{x}}} \right) ^{n\alpha } \, \textrm{d} \alpha \; = \; 0.215 \,, \end{aligned}$$
and $\delta _H = 0.570$, indicating that there is not enough evidence against H.
Case [B] The hypothesis $H_B$ identifies the straight line of equation $\beta = {\frac{1}{\mu _{H}}} \alpha$ in the $\alpha \beta$-plane (see Fig. 4 [B]) and the two subsets
$$\begin{aligned} {\varvec{\Theta }}_c = \big \{ (\alpha , \beta ): \beta > {\frac{1}{\mu _{H}}} \alpha \big \} \, and \, {\varvec{\Theta }}_d= \big \{ (\alpha , \beta ): \beta < {\frac{1}{\mu _{H}}} \alpha \big \}. \end{aligned}$$
We have
$$\begin{aligned} {\mathbb {P}} \big ( (\alpha ,\beta ) \in {\varvec{\Theta }}_d\ \vert \ \varvec{x} \big ) \; = \; \displaystyle \int _{{\varvec{\Theta }}_d} g_1(\alpha , \beta \mid {\varvec{x}}) \, \textrm{d}\alpha \, \textrm{d} \beta \; = \; 0.012 \,, \end{aligned}$$
and, since $\delta _H = 0.976$, we have strong evidence against $H_B$.
Case [C] The hypothesis $H_C$ identifies the parabola of equation $\beta = {\frac{1}{\sqrt{\sigma ^2_{H}}}} \sqrt{\alpha },$ in the $\alpha \beta$-plane (see Fig. 4 [C]), and the two subsets
$$\begin{aligned} {\varvec{\Theta }}_e = \big \{ (\alpha ,\beta ): \beta > {\frac{1}{\sqrt{\sigma ^2_{H}}}} \sqrt{\alpha } \big \} \, and \, {\varvec{\Theta }}_f= \big \{ (\alpha , \beta ): \beta < {\frac{1}{\sqrt{\sigma ^2_{H}}}} \sqrt{\alpha } \big \}. \end{aligned}$$
We have
$$\begin{aligned} {\mathbb {P}} \big ( (\alpha , \beta ) \in {\varvec{\Theta }}_e \ \vert \ \varvec{x} \big ) \; = \; \displaystyle \int _{{\varvec{\Theta }}_e} g_1(\alpha , \beta \mid {\varvec{x}}) \, \textrm{d}\alpha \, \textrm{d} \beta \; = \; 0.078 \,. \end{aligned}$$
Therefore $\delta _H = 0.846$, and so we have not strong evidence against $H_C$.

Example 4

- Test on the coefficient of variation for a Normal distribution Given an iid sample ${\varvec{x}} = (x_1,\dots ,x_n)$ from $X \sim N \big ( x \vert \mu , \phi ^{-1} \big )$, the parameter of interest is $\psi = \displaystyle {\frac{\sqrt{Var(X)}}{\mid \mathbb {E}(X) \mid }} =\displaystyle {\frac{1}{\mid \mu \mid \sqrt{\phi }} }$. We are interested in testing the hypothesis

$$\begin{aligned} H: \psi = \psi _H, \end{aligned}$$

with $\psi _H = 0.1$. If we consider the Jeffreys’ prior $g_0(\mu ,\phi ) \propto \phi ^{-1} \cdot {\textbf{1}}_{\mathbb {R}\times \mathbb {R}^+},$ the posterior distribution is the Normal-Gamma density

$$\begin{aligned} (\mu ,\phi ) \mid \varvec{x} \; \sim \; NG \big ( \mu , \phi \mid \eta , \nu , \alpha , \beta \big ) \,,\, \end{aligned}$$

with hyperparameters $( \eta , \nu , \alpha , \beta )$, where $\eta = \bar{x}$, $\nu = n$, $\alpha = {\frac{1}{2}}(n-1)$, $\beta = {\frac{1}{2}}n s^2$, and density

$$\begin{aligned} g_1( \mu , \phi \mid \eta , \nu , \alpha , \beta )=\frac{\beta ^{\alpha }\sqrt{\nu } }{\Gamma (\alpha )\sqrt{2 \pi }} \phi ^{\alpha -1/2} e^{-\frac{\nu \phi }{2} (\mu - \eta )^2}e^{-\beta \phi }. \end{aligned}$$

We consider the particular case in which $\bar{x} = 17$ and $s^2 = 1.6$ (so that the MLE is $\hat{\phi } = 0.074$) with two samples of size $n= 10$ (Fig. 5 [A]) and $n= 40$ (Fig. 5 [B]). In the $\mu \phi -$space, the hypothesis H is represented by the curve $\phi = \displaystyle {\frac{1}{\psi _H^2 }\mu ^{-2}}$ and determines the subsets ${\varvec{\Theta }}_a$ and ${\varvec{\Theta }}_b$ visualized in Fig. 5.

In case [A] we have

$$\begin{aligned} {\mathbb {P}} \big ( (\mu , \phi ) \in {\varvec{\Theta }}_b\ \vert \ \varvec{x} \big )= \int _{{\varvec{\Theta }}_b} g_1( \mu , \phi \mid \eta , \nu , \alpha , \beta ) \, \textrm{d}\mu \, \textrm{d} \phi = 0.215, \end{aligned}$$

where $g_1( \mu , \phi \mid \eta , \nu , \alpha , \beta )$ is the Normal-Gamma density, so that $\delta _H = 0.570$ and there is not enough evidence against H. In case [B], we have ${\mathbb {P}} \big ( (\mu , \phi ) \in {\varvec{\Theta }}_b \ \vert \ \varvec{x} \big ) = 0.014$ and, since $\delta _H = 0.972$, there is strong evidence against H. Therefore in such a case, with different sample sizes, the inferential conclusions change (Fig. 6).

Example 5

- Test on the skewness coefficient of the Inverse Gaussian distribution Let us consider a Inverse Gaussian random variable X with density

$$\begin{aligned} f(x \mid \mu , \nu )=\sqrt{\frac{\nu }{2\pi x^3}} \exp {\Big \{ -\frac{\nu }{2} \Big (\frac{x-\mu }{\mu \sqrt{x}}\Big )^2\Big \}} \cdot {\textbf {1}}_{\mathbb {R}^+}(x), \end{aligned}$$

where $(\mu ,\nu ) \in \mathbb {R}^+ \times \mathbb {R}^+$. The parameter of interest is the skewness coefficient $\gamma = 3 \sqrt{\frac{\mu }{\nu }}$ and it is of interest to test the hypothesis $H: \gamma =\gamma _H$, where $\gamma _H = 2$. The Jeffreys’ prior is

$$\begin{aligned} g_0(\mu ,\nu ) \propto \frac{1}{\sqrt{\mu ^3 \nu }} \cdot {\textbf {1}}_{\mathbb {R}^+\times \mathbb {R}^+}(\mu ,\nu ). \end{aligned}$$

Given n observations, the posterior distribution of $(\mu , \nu )$ is

$$\begin{aligned} g_1(\mu , \nu \vert \varvec{x}) \; \propto \; \sqrt{\frac{\nu ^{n-1}}{\mu ^3}} \cdot \exp { \left\{ - \frac{n \ \nu }{2}\cdot \left( \frac{\bar{x}}{\mu ^2} - \frac{2}{\mu } + \frac{1}{a} \right) \right\} } \cdot {\textbf {1}}_{\mathbb {R}^+\times \mathbb {R}^+}(\mu ,\nu ), \end{aligned}$$

where $\bar{x}$ and a are the arithmetic and harmonic mean, respectively.

We apply the procedure to the following rainfall data (inches) analyzed in Folks and Chhikara (1978) (p. 272):

$$\begin{aligned} \begin{array}{ccccccc} 1.01 &{}\quad 1.11 &{}\quad 1.13 &{}\quad 1.15 &{}\quad 1.16\\ 1.17 &{}\quad 1.17 &{}\quad 1.20 &{}\quad 1.52 &{}\quad 1.54\\ 1.54 &{}\quad 1.57 &{}\quad 1.64 &{}\quad 1.73 &{}\quad 1.79\\ 2.09 &{}\quad 2.09 &{}\quad 2.57 &{}\quad 2.75 &{}\quad 2.93\\ 3.19 &{}\quad 3.54 &{}\quad 3.57 &{}\quad 5.11 &{}\quad 5.62. \end{array} \end{aligned}$$

The hypothesis identifies in the parameter space $\varvec{\Theta } = \mathbb {R}^+ \times \mathbb {R}^+$ the subsets

$$\begin{aligned} \begin{array}{ll} \varvec{\Theta }_a \;=&{} \Big \{(\mu ,\nu )\in \varvec{\Theta }: 3\sqrt{\frac{\mu }{\nu }} < \gamma _H \Big \} \,, \\ \varvec{\Theta }_H \;=&{} \Big \{(\mu ,\nu )\in \varvec{\Theta }: 3 \sqrt{\frac{\mu }{\nu }} = \gamma _H \Big \} \,, \\ \varvec{\Theta }_b \;=&{} \Big \{(\mu ,\nu )\in \varvec{\Theta }: 3\sqrt{\frac{\mu }{\nu }} > \gamma _H \Big \} \,. \end{array} \end{aligned}$$

We have that

$$\begin{aligned} \mathbb {P}\big ((\mu ,\nu ) \in \varvec{\Theta }_b \ \vert \ \varvec{x} \big ) \; = \; \int _{\varvec{\Theta }_b} g_1(\mu , \nu \vert \varvec{x}) \ \textrm{d}\mu \; \textrm{d}\nu = 0.078\,, \end{aligned}$$

(18)

see Fig. 6, then we obtain $\delta _H = 0.844$. This result indicates that we do not have enough evidence against the hypothesis H.

3.2.2 Tests involving two independent populations

In this section we consider some examples concerning comparisons between parameters of two independent populations.

Example 6

- Comparison between means and precisions of two independent Normal populations Let us consider a case study on the dating of the core and periphery of some wooden furniture, found in a Byzantine church, using radiocarbon (see Casella and Berger (2001), p. 409). The historians wanted to verify if the mean age of the core is the same as the mean age of the periphery, using two samples of sizes $m =14$ and $n = 9$, respectively, given by

$$\begin{array}{*{20}{c}} {core}&{\quad 1294}&{\quad 1279}&{\quad 1274}&{\quad 1264}&{\quad 1263}&{\quad periphery}&{\quad 1284}&{\quad 1272}&{\quad 1256}\\ {}&{\quad 1254}&{\quad 1251}&{\quad 1251}&{\quad 1248}&{\quad 1240}&\quad &{\quad 1254}&{\quad 1242}&{\quad 1274}\\ {}&{\quad 1232}&{\quad 1220}&{\quad 1218}&{\quad 1210}&\quad &{}&{\quad 1264}&{\quad 1256}&{\quad 1250} \end{array}$$

We assume that the age of the core X and of the periphery Y are distributed as

$$\begin{aligned} X \sim N(x \vert \mu _1,\phi _1^{-1}) \quad \textrm{and} \quad Y \sim N(y\vert \mu _2,\phi _2^{-1}), \end{aligned}$$

where $Var(X) = \phi _1^{-1}$ and $Var(Y) = \phi _2^{-1}$, and we assume that the data are iid conditional on the parameters. We consider for $(\mu _i,\phi _i)$ the Jeffreys’ prior

$$\begin{aligned} g_0^i(\mu _i,\phi _i) \propto \phi _i^{-1} \cdot {\textbf{1}}_{\mathbb {R}\times \mathbb {R}^+} \;, \; i=1,2 \,. \end{aligned}$$

We obtain $\bar{x} = 1249.86$, $\bar{y} = 1261.33$, $\bar{d} = \bar{x} - \bar{y} = -11.48$, while the MLEs for the sample standard deviations are $s_1 = 23.43$ and $s_2 = 12.51.$ The posterior distribution for $(\mu _i,\phi _i)$ is the Normal-Gamma law

$$\begin{aligned} (\mu _i,\phi _i) \mid \varvec{x}, \varvec{y} \; \sim \; NG \big ( \mu _i, \phi _i \mid \eta _i, \nu _i, \alpha _i, \beta _i \big ) \,, \; i=1,2, \, \end{aligned}$$

with hyperparameters $\eta _1 = \bar{x}$, $\nu _1 = m$, $\alpha _1 = {\frac{1}{2}}(m-1)$, $\beta _1 = {\frac{1}{2}}m s_1^2$ and $\eta _2 = \bar{y}$, $\nu _2 = n$, $\alpha _2 = {\frac{1}{2}}(n-1)$, $\beta _2 = {\frac{1}{2}}n s_2^2$, and density

$$\begin{aligned} g^i_1( \mu _i, \phi _i \mid \eta _i, \nu _i, \alpha _i, \beta _i)=\frac{\beta _i^{\alpha _i}\sqrt{\nu _i} }{\Gamma (\alpha _i)\sqrt{2 \pi }} \phi _i^{\alpha _i-1/2} e^{-\frac{\nu _i \phi _i}{2} (\mu _i- \eta _i)^2}e^{-\beta _i\phi _i}, \; i=1,2. \end{aligned}$$

The hypothesis of interest

$$\begin{aligned} H_A: \mu _1 -\mu _2 = 0, \quad \forall \phi _1> 0, \quad \forall \phi _2 > 0, \end{aligned}$$

identifies the following subsets in the parameter space

$$\begin{aligned} \begin{array}{ll} {\varvec{\Theta }}_a &{}= \ \Big \{ \, {\mathbb {R}}^2 \times {\mathbb {R}}_+^2: \mu _1 < \mu _2 \; \Big \},\\ {\varvec{\Theta }}_{H_A} &{}= \ \Big \{ \, {\mathbb {R}}^2 \times {\mathbb {R}}_+^2: \mu _1 = \mu _2 \; \Big \},\\ {\varvec{\Theta }}_b &{}= \ \Big \{ \, {\mathbb {R}}^2 \times {\mathbb {R}}_+^2: \mu _1 > \mu _2 \; \Big \}. \end{array} \end{aligned}$$

Then we can compute

$$\begin{aligned} {\mathbb {P}}\big ( (\mu _1, \mu _2, \phi _1, \phi _2) \in {\varvec{\Theta }}_a \ \vert \ \varvec{x}, \varvec{y}\big ) \hspace{8cm} \\ \begin{array}{ll} \hspace{0.5 cm} &{}=\; \displaystyle \int _{{\varvec{\Theta }}_a} \prod _{i=1}^{2} g^i_1( \mu _i, \phi _i \mid \eta _i, \nu _i, \alpha _i, \beta _i) \, \textrm{d}\mu _1 \, \textrm{d}\mu _2 \, \textrm{d} \phi _1 \, \textrm{d} \phi _2 \\ &{}=\; \displaystyle \int _{\mu _1< \mu _2} \prod _{i=1}^{2} \frac{\Gamma (\alpha _i+\frac{1}{2})}{\Gamma (\alpha _i)}\Big (\frac{\nu _i}{2 \pi \beta _i}\Big )^{1/2} \Big [1+\frac{\nu _i}{2 \beta _i}(\mu _i-\eta _i)^2\Big ]^{-(\alpha _i+\frac{1}{2})} \, \textrm{d}\mu _1 \, \textrm{d}\mu _2 \\ &{}=\; \displaystyle 0.089 \,, \end{array} \end{aligned}$$

so we have $\delta _H = 0.823$, a value that does not indicate evidence against the hypothesis. We exploited the fact that the marginal of each $\mu _i$ is a Generalized Student’s t-distribution (denoted by StudentG) with hyperparameters $\big ( \eta _i, \frac{ \nu _i\alpha _i}{\beta _i}, 2 \alpha _i \big )$.

Figure 7 [A] in the space $(\mu _1, \mu _2)$ shows the contour lines of the distribution

$$\begin{aligned} StudentG \Big ( \mu _1 \big \vert \, \eta _1, {\frac{\nu _1 \cdot \alpha _1}{\beta _1}}, 2\alpha _1 \Big ) \cdot StudentG \Big ( \mu _2 \big \vert \, \eta _2, {\frac{\nu _2 \cdot \alpha _2}{\beta _2}}, 2\alpha _2 \Big ). \end{aligned}$$

Note that the homoscedasticity assumption is not necessary. Consider now the hypothesis

$$\begin{aligned} H_B: \phi _1 - \phi _2 =0, \quad \forall \mu _1, \mu _2, \end{aligned}$$

which determines in the parameter space the subsets

$$\begin{aligned} \begin{array}{ll} {\varvec{\Theta }}_c &{}= \; \Big \{ \, {\mathbb {R}}^2 \times {\mathbb {R}}_+^2: \phi _1 < \phi _2 \; \Big \}, \\ {\varvec{\Theta }}_{H_B} &{}= \; \Big \{ \, {\mathbb {R}}^2 \times {\mathbb {R}}_+^2: \phi _1 = \phi _2 \; \Big \}, \\ {\varvec{\Theta }}_d &{}= \; \Big \{ \, {\mathbb {R}}^2 \times {\mathbb {R}}_+^2: \phi _1 > \phi _2 \; \Big \}. \end{array} \end{aligned}$$

We have

$$\begin{aligned} {\mathbb {P}} \big ( (\mu _1, \mu _2, \phi _1, \phi _2) \in {\varvec{\Theta }}_c \ \vert \ \varvec{x}, \varvec{y} \big ) = \int _{\phi _1< \phi _2} \prod _{i=1}^{2} \frac{\beta _i^{\alpha _i}}{\Gamma (\alpha _i)}\phi _i^{\alpha _i-1}e^{-\phi _i\beta _i} \, \textrm{d}\phi _1 \, \textrm{d}\phi _2 = 0.046, \end{aligned}$$

from which it follows that $\delta _H = 0.908$ and there is strong evidence against the hypothesis H. To compute the integral we have used the fact that the marginal of each $\phi _i$ has Gamma distribution with parameters $(\alpha _i, \beta _i), \ i=1,2$.

The contour lines of the law $Gamma ( \phi _1 \vert \alpha _1, \beta _1 ) \cdot Gamma ( \phi _2 \vert \alpha _2, \beta _2),$ in the space $(\phi _1, \phi _2)$, are reported in Figure 7 [B].

Example 7

- Comparison of the shape parameter of two Gamma distributions Let us consider two iid Gamma populations $X_i \sim Gamma \big ( \alpha _i, \beta _i \big ),$ $(\alpha _i, \beta _i) \in {\mathbb {R}}^+ \times {\mathbb {R}}^+$, $i=1,2$, and let us consider two samples of sizes $n_1=9$ and $n_2=12$, respectively, with sample means $\bar{x}_1 = 2.811$ and $\bar{x}_2 = 1.973$, and geometric means $m_{g_1} = 2.116$ and $m_{g_2} = 1.327$.

We are interested in testing $H: \alpha _1 = \alpha _2.$ The posterior distribution for $(\alpha _1, \beta _1, \alpha _2, \beta _2)$ is given by

$$\begin{aligned} g_1( \alpha _1, \beta _1, \alpha _2, \beta _2 \vert {\varvec{x}_1}, {\varvec{x}_2})= g_1^1 (\alpha _1, \beta _1 \vert {\varvec{x}_1}) \cdot g_1^2 (\alpha _2, \beta _2 \vert {\varvec{x}_2} ) \,, \end{aligned}$$

where

$$\begin{aligned} g_1^i(\alpha _i, \beta _i \vert {\varvec{x}_i} ) = k_i \cdot g_0 ( \alpha _i, \beta _i ) \cdot L(\alpha _i, \beta _i \mid {\varvec{x}_i}), \end{aligned}$$

with normalizing constant $k_i$, $i=1,2$. Let $\varvec{\Theta }_a = \big \{(\alpha _1,\alpha _2) \in \mathbb {R}^+ \times \mathbb {R}^+: \alpha _1 > \alpha _2\big \}$ and $\varvec{\Theta }_b = \big \{(\alpha _1,\alpha _2) \in \mathbb {R}^+ \times \mathbb {R}^+: \alpha _1 < \alpha _2\big \}$ (see Figure 8). In order to test the hypothesis H, we compute the probability

$$\begin{aligned}&{\mathbb {P}} ( (\alpha _1,\alpha _2) \in \varvec{\Theta }_b \ \vert \ {\varvec{x}_1}, {\varvec{x}_2} )\\&=\displaystyle \int _{\alpha _1< \alpha _2} \int _{{\mathbb {R}}^+ \times {\mathbb {R}}^+} g_1^1 (\alpha _1 , \beta _1 \vert {\varvec{x}_1} ) \cdot g_1^2(\alpha _2 ,\beta _2 \vert {\varvec{x}_2} ) \, \textrm{d} \beta _1 \textrm{d} \beta _2 \, \textrm{d}\alpha _1 \, \textrm{d}\alpha _2 \\&= \displaystyle \int _{\alpha _1 < \alpha _2} \prod _{i=1}^2 k_i \cdot g_0^\alpha (\alpha _i) \cdot {\frac{\Gamma (n_i \alpha _i)}{\Gamma (\alpha _i)^{n_i}}} \cdot \left( \frac{{m_{g_i}}}{n_j \bar{x}_i} \right) ^{n_i \alpha _i} \textrm{d}\alpha _1 \textrm{d}\alpha _2= 0.311 \end{aligned}$$

and, since $\delta _H = 0.378$, there is evidence in favour of H.

4 Comparison with the FBST

In this section we present a comparison of the BDT with the Full Bayesian Significance Test (FBST) as presented in Pereira and Stern (2020), which provides an overview of the e-value.

In order to facilitate the discussion, let us briefly review the definition of the e-value and the related testing procedure. The FBST can be used with any standard parametric statistical model, where $\varvec{\theta } \in \Theta \ \subseteq \ \mathbb {R}^p$. It tests a sharp hypothesis H which identifies the null set $\Theta _{H}$. The conceptual approach of the FBST consists of determining the e-value that represents the Bayesian evidence against H. To construct this measure, the authors introduce the posterior surprise function and its supremum, given respectively by

$$\begin{aligned} s(\varvec{\theta }) = \frac{g_1(\varvec{\theta } \vert \varvec{x})}{r(\varvec{\theta })} \quad \text { and }\quad s^*=s(\varvec{\theta }^*) = \sup _{\varvec{\theta } \in \Theta _{H}} s(\varvec{\theta }), \end{aligned}$$

where $r(\varvec{\theta })$ is a suitable reference function to be chosen. Then, a tangential set is defined as

$$\begin{aligned} \overline{T}(s^*) = \{\varvec{\theta } \in \Theta \vert s(\varvec{\theta }) > s^*\}, \end{aligned}$$

to the sharp hypothesis H, also called a Highest Relative Surprise Set (HRSS), which includes all parameter values $\varvec{\theta }$ that attain a larger surprise function value than the supremum $s^*$ of the null set. Finally, the e-value, that represents the Bayesian evidence against H, is defined as

$$\begin{aligned} \overline{ev}(H) = \overline{W}(s^*) = \int _{\overline{T}(s^*)} g_1(\varvec{\theta } \vert \varvec{x}) \ d\varvec{\theta }. \end{aligned}$$

On the contrary, the e-value in support of H is $ev(H) = 1 - \overline{ev}(H_0)$, which is evaluated by means of the set $T(s^*)=\Theta \setminus \overline{T}(s^*)$ and the cumulative surprise function $W(s^*) = 1 - \overline{W}(s^*)$. In conclusion, the FBST is the procedure that rejects H whenever $\overline{ev}(H)$ is large.

As pointed out in Pereira and Stern (2020) (Section 3.2) “the role of the reference density is to make$\overline{ev}(H)$ explicitly invariant under suitable transformations of the coordinate system”. A first non-invariant definition of this measure, which corresponds to the use of a flat reference function $r(\theta )\propto 1$ in the second formulation, has been given in Pereira and Stern (1999). The first version involved the determination of the tangential set $\overline{T}$ starting only from the posterior distribution, whereas in the second, a corrective element has been introduced by also including the reference function. Some of the suggested choices for the reference function are the use of uninformative priors such as “the uniform, maximum entropy densities, or Jeffreys’ invariant prior” (see Pereira and Stern (2020), Section 3.2).

4.1 Similarities and differences between the procedures

The most striking similarity between the FBST and the BDT is that both tests, fully accepting the likelihood principle and relying on the posterior distribution of the parameter $\varvec{\theta } \in \varvec{\Theta }$, are clearly Bayesian.

Another important similarity is that, asymptotically, both tests lead to the rejection of the hypothesis H when it is false (i.e. when we test $\theta _H \ne \theta ^*$ where $\theta ^*$ is the true value of the parameter). On the contrary, if $\theta ^*=\theta _H$ they have a different asymptotic behaviour (see Proposition 1 for the BDM and Section 3.4 in Pereira and Stern (2020) for the e-value).

Certainly, the FBST has a more general reach than the BDT. Indeed, it examines the entire class of sharp hypotheses, whereas the extension of the BDT to such hypotheses is not straightforward and, currently, is limited to considering the subclass of the hypotheses expressed as $H:\varphi =\varphi _H$ that are able to partition the parameter space $\varvec{\Theta }$ as $\big \{ \varvec{\Theta }_a, \, \varvec{\Theta }_H, \, \varvec{\Theta }_b \big \}$. Moreover, notice that while the integration sets $\varvec{\Theta }_a$ and $\varvec{\Theta }_b$ are determined exclusively by the hypothesis, the tangential set $\overline{T}$ depends on the hypothesis, the posterior density and the choice of the reference function. It is questionable, on the other hand, whether the e-value is as easily computable as the BDM is in cases where the parameter space has dimension higher than 1.

Unlike the BDM, the elimination of nuisance parameters is not recommended when using the e-value. In fact, this measure is not invariant with respect to marginalisations of the nuisance parameter and the use of marginal densities to construct credible sets may produce inconsistency.

It is easy to see that one can create an analogy between the p-value, the e-value and $\delta _H$. Regarding frequentist p-values, the sample space is ordered according to increasing inconsistency with the assumed null hypothesis H. The FBST instead orders the parameter space according to increasing inconsistency with the assumed null hypothesis H, based on the concept of statistical surprise. In the same way, it can be seen that the probability in (7) has to do with the posterior probability of exceeding $\theta _H$ in a direction in contrast with the data (namely, the side where there is more posterior probability).

Another similarity occurs when considering the reference density $r(\theta )$ as the (possibly improper) uniform density, since the first and second definitions of evidence define the same tangent set, i.e. the HRSS and the HPDS coincide. Then, for a scalar parameter $\theta$, since the BDM is linked to the equi-tailed credible regions while the e-value is linked to the HPDS, we have that if:

$g_1(\theta \vert \varvec{x})$ is symmetric and unimodal, then $\overline{ev}(H) = \delta _H$;
$g_1(\theta \vert \varvec{x})$ is asymmetric and unimodal (for instance with positive skewness) and $m_1 < \theta _H$ [$\theta _H < m_1$], then $\overline{ev}(H) > \delta _H$ [$\overline{ev}(H) < \delta _H$]. When $m_1=\theta _H$ we have $0=\delta _H < \overline{ev}(H)$.

4.1.1 Simulation study

In order to determine the resulting false-positive rates of both the FBST and the BDT, we conduct a simulation study for specific sample sizes, considering a continuous (Exponential) and a discrete (Poisson) model and for each one two different choices for the prior distribution, the Jeffreys’ and the conjugate priors. The last ones have been chosen to have mean “far" from the true hypothesized values for the parameters. Regarding the FBST we have considered two different choices for the reference function $r(\theta )$, the flat and the prior density.

Let $\varvec{x}=(x_1, \dots , x_n)$ be an iid sample of size n from the Exponential distribution $X \sim Exp\big (x \vert 1/\theta ^* \big )$, with $\theta ^*=1.2$. We are interested in testing the hypothesis $H: \theta _H=\theta ^*=1.2$. Assuming the Jeffreys’ prior $g_0(\theta ) \propto \theta ^{-1}$, the posterior distribution is $InvGamma(\theta \vert n, \sum x_i)$ (see Example 1), while adopting a $InvGamma(\theta \vert \alpha _0, \beta _0)$ prior, with $\alpha _0=3$ and $\beta _0=6$, we have a posterior that is still $InvGamma(\theta \vert \alpha _1, \beta _1)$, with parameters $\alpha _1=\alpha _0+n$ and $\beta _1=\beta _0+\sum x_i$. Let now $\varvec{y}=(y_1, \dots , y_n)$ be an iid sample of size n from a Poisson distribution $Y \sim Poi\big (y \vert \lambda ^* \big )$, with $\lambda ^*=3$. Interest is on the hypothesis $H: \lambda _H=\lambda ^*=3$. For both choices of the prior, the Jeffreys’ $g_0(\lambda ) \, \propto \; \lambda ^{-\frac{1}{2}}$ and the conjugate $Gamma(\lambda \vert \alpha _0, \beta _0)$, we have a Gamma posterior $Gamma(\lambda \vert \alpha _1, \beta _1)$, with parameters respectively equal to $\alpha _1=\sum y_i+ \frac{1}{2}$, $\beta _1=n$ and $\alpha _1 = \alpha _0 + \sum y_i$, $\beta _1 = \beta _0 + n$.

Table 1 shows the simulation results for three different values of the threshold $\omega =\{0.90,0.95,0.99\}$, for $S=50000$ simulations and $D=50000$ posterior draws for the Exponential model. Concerning the Exponential model with the Jeffreys’ prior across the different sample sizes considered, the false-positive rates are very similar for both tests (two different version of the FBST and the BDM) and, as we expect since we are using objective priors (see Bayarri and Berger (2004)), they are close to the error of the first type $\alpha =\{0.10, 0.05, 0.01\}$, related to $\omega$. With the conjugate prior the BDM seems to perform better w.r.t. the two versions of the FBST. Concerning the Poisson model, we have good results for large sample sizes, but also for smaller n expecially with the conjugate prior (see Table 2).

Table 1 Exponential distribution: false positive rates for different sample sizes n and different thresholds $\omega$.

Full size table

Table 2 Poisson distribution: false positive rates for different sample sizes n and different thresholds $\omega$

Full size table

4.1.2 Some examples

In order to compare the BDM and the e-value, let us consider different situations and then examine the results.

Example 8

(Continuation of Example 1) As a first comparative scenario, consider the test performed in Example 1 in which $\theta _H=2.4$ and additionally the case in which $\theta _H=0.7$. Since the posterior $g_1(\theta \vert \varvec{x})$ has a positive skewness and $m_1 < \theta _H=2.4$ then $\overline{ev}(H) > \delta _H$, on the contrary, for $m_1 > \theta _H=0.7$ then $\overline{ev}(H) < \delta _H$. Indeed, we find the results reported in Table 3.

Table 3 Results, for the three different cases examined in Example 1, of $\delta _H$ and the e-value considering, as a reference distribution, both a flat reference function and a Jeffreys’ prior

Full size table

The differences between the e-value and $\delta _H$, which in this example appear to be modest, can actually become meaningful when the posterior has a greater asymmetry and heavier tails. In such case, comparing different hypotheses, the FBST always leads to favour the hypothesis with higher density. Moreover, the e-value may be more or less robust w.r.t. the position of $\theta _H$, as it is highlighted in the example below.

Example 9

- Test on the mean of the Inverse Gaussian distribution Consider a random variable X with Inverse Gaussian distribution $X \sim IG(x \vert \mu , \nu _0)$, $\mu \in {\mathbb {R}}^+$ and $\nu _0$ known. Given an iid sample $\varvec{x}$ of size n, the likelihood function for $\mu$ is $L(\mu \vert \varvec{x}) \propto \exp \left\{ -n \nu _0 \cdot \left( \frac{\bar{x}}{2\mu ^2}- \frac{1}{\mu }\right) \right\} .$ Adopting the Jeffreys’ prior $g_0(\mu ) \propto \frac{1}{\sqrt{\mu ^3}}$, we obtain the posterior

$$\begin{aligned}g_1(\mu \vert {\varvec{x}}) \propto \frac{1}{\sqrt{\mu ^3}} \cdot \exp \left\{ -n \nu _0 \cdot \left( \frac{\bar{x}}{2\mu ^2}- \frac{1}{\mu }\right) \right\} . \end{aligned}$$

We are interested in testing the hypothesis $H: \, \mu = \mu _H$ and we consider a sample of size $n=8$ for which $\bar{x} = 4.2$ and $m_1 = 4.483$. For $\nu _0=5$, we choose to test $H_A: \mu = 2.5$ and $H_B: \mu = 12$. The results of the analysis are displayed in Table 4 and Fig. 9. If we choose $\omega = 0.95$ as a rejection threshold in both cases, and with both references, we are lead to opposite inferential conclusions.

Table 4 For the two different hypotheses examined in Example 9, the table shows $\delta _H$ and the e-value considering, as a reference distribution, both a flat reference function and a Jeffreys’ prior

Full size table

Example 10

(Continuation of Examples 3, 4, 5) Let us now compare the results obtained with the FBST and the BDT for the Examples 3, 4 and 5, when fixing a value of 0.95 as a rejection threshold.

The conclusions reached with the FBST and with the BDT for Example 3, which can be seen in Table 5, are the same (for both reference functions considered) although, in some cases, there are substantial differences between the values of the evidence measures. To summarise, the hypothesis $H_B$ has to be rejected while not enough evidence is available for the rejection of the hypotheses $H_A$ and $H_C$.

Moving on to Example 4 we can say that the analysis of the findings with the two different tests appears to be more complex than the previous one, see Table 6. In case [A], for both BDT and FBST with the flat reference function, there is not enough evidence to reject the hypothesis. On the contrary, if one considers the FBST with the Jeffreys’ prior as reference function, one is led to reject this hypothesis. In case [B], by rejecting the hypothesis, the BDT is in agreement with the FBST with the Jeffreys’ reference function in contrast to the FBST with the flat reference function for which there is not enough evidence to reject it.

Finally, in the case illustrated in Example 5, the conclusion reached with the FBST and with the BDT is the same (for both reference functions considered), i.e. there is not enough evidence to reject the hypothesis (see Table 7). It should be noted that, again, there are substantial differences between the values of the evidence measures.

Table 5 Results of the Example 3 on the test on the shape parameter, mean and variance of the Gamma distribution

Full size table

Table 6 Results of the Example 4 on the test of the coefficient of variation for a Normal distribution

Full size table

Table 7 Results of the Example 5 on the test of the skewness coefficient of the Inverse Gaussian distribution

Full size table

The calculation of the FBST for a scalar parameter of interest without nuisance parameters, has been carried out through the function defined in the ‘fbst’ package for R (Kelter 2022). Instead, tangential sets $\overline{T}$ and its integrals, for Examples 3, 4 and 5, were determined by means of the Mathematica software. Browsing through the code that leads to the calculation of these measures (see Manca (2022)), it is evident that more work is required for the calculation of the integration region related to the FBST. In this sense, the BDT appears to be easier to apply.

5 Conclusions

We propose a new measure of evidence in a Bayesian perspective. From an examination of the examples illustrated, the conceptual simplicity of the proposed method is evident as well as its theoretical consistency. We have presented some simple cases where the computation of the BDM is straightforward.

In some situations, the BDM can be usefully applied adopting a subjective prior. It is indeed interesting the situation where one or more statisticians choose the hypothesis H and the prior according to his or their knowledge. In such cases the BDT would have a confirmatory value. The use of subjective priors must be accompanied by a robustness study especially in the case of small sample sizes.

So far we have considered only hypotheses that induce a partition on the parameter space, but the extension of the definition and the analysis of the BDT to more complex hypotheses is under investigation. Theoretical and computational developments in more general contexts are also being explored.

References

Bayarri MJ, Berger JO (2004) The interplay of Bayesian and frequentist analysis. Stat Sci 19(1):58–80
Article MathSciNet Google Scholar
Benjamin DJ, Berger JO (2019) Three reccomandations for improving the use of p-values. Am Stat 73:186–191
Article Google Scholar
Benjamin DJ, Berger JO, Johannesson M, Nosek BA, Wagenmakers EJ, Berk R, Bollen KA, Brembs B, Brown L, Camerer C et al (2018) Redefine statistical significance. Nat Human Behav 2(1):6–10
Article Google Scholar
Berger JO (1985) Statistical decision theory and Bayesian analysis, 2nd edn. Springer, New York
Book Google Scholar
Bernardo JM, Smith AFM (1994) Bayesian theory. John Wiley & Sons Inc., New York
Book Google Scholar
Casella G, Berger RL (2001) Statistical inference, 2nd edn. Duxbury, Pacific Grove (CA)
Google Scholar
Christensen R (2005) Testing Fisher, Neyman, Pearson, and Bayes. Am Stat 59(2):121–126
Article MathSciNet Google Scholar
Collaboration OS (2015) Estimating the riproducibility of psychological science. Science 349(aac4716)
Denis DJ (2004) The modern hypothesis testing hybrid: R. A. Fisher’s fading influence. Journal de la société française de statistique 145(4):5–26
Fisher RA (1925) Statistical methods for research workers. Oliver & Boyd, Edinburgh
Google Scholar
Folks JL, Chhikara RS (1978) The inverse Gaussian distribution: theory methodology and applications—a review. J R Stat Soc Ser B 40(3):263–289
Article Google Scholar
Johnson VE, Payne RD, Wang T, Acher A, Mandal S (2017) On the riproducibility of psychological science. JASA 112(517):1–10
Article Google Scholar
Kelter R (2022) fbst: an R package for the full Bayesian significance test for testing a sharp null hypothesis against its alternative via the e-value. Behav Res Methods 54(3):1114–1130
Article Google Scholar
Lindley DV (1965) Introduction to probability and statistics from a Bayesian viewpoint. Cambridge University Press, Cambridge, UK
Book Google Scholar
Lindley DV (1991) Making decisions, 2nd edn. Wiley, Louisville
Google Scholar
Manca M (2022) maramanca/A_new_Bayesian_Discrepancy_Measure: A new Bayesian Discrepancy Measure. Zenodo. https://doi.org/10.5281/zenodo.7317122
Article Google Scholar
O’Hagan A (2003) HSSS model criticism. In: Green PJ, Hjort NL, Richardson ST (eds) Highly structured stochastic systems. Oxford University, Oxford, pp 423–445
Chapter Google Scholar
Pereira C, Stern JM (1999) Evidence and credibility: full bayesian significance test for precise hypotheses. Entropy 1(4):99–110
Article Google Scholar
Pereira C, Stern JM (2020) The $e$-value: a fully Bayesian significance measure for precise statistical hypotheses and its research program. Sao Paulo J. Math, Sci
Google Scholar
Ruli E, Ventura L (2021) Can Bayesian, confidence distribution and frequentist inference agree? Stat Methods Appl 30(1):359–373
Article MathSciNet Google Scholar
Spiegelhalter D (2019) The art of statistics: learning from data. Penguin Books, London
Google Scholar
Wasserstein RL, Lazar NA (2016) The asa statement on p-values: context, process, and purpose. Am Stat 70(2):129–133
Article MathSciNet Google Scholar
Yang R, Berger JO (1996) A catalog of noninformative priors. Duke University, Durham, Institute of Statistics and Decision Sciences
Google Scholar

Download references

Acknowledgements

The authors thank the Associate Editor and the Referees for helpful comments and suggestions that improved the quality of the paper. Research partially supported by the Fondazione di Sardegna 2020.

Funding

Open access funding provided by Università degli Studi di Cagliari within the CRUI-CARE Agreement.

Author information

Authors and Affiliations

Department of Mathematics and Computer Science, University of Cagliari, via Ospedale, 72, 09124, Cagliari, Italy
Francesco Bertolino, Mara Manca, Monica Musio & Walter Racugno
Department of Statistical Sciences, University of Padua, via Cesare Battisti, 241, 35121, Padua, Italy
Laura Ventura

Authors

Francesco Bertolino
View author publications
You can also search for this author in PubMed Google Scholar
Mara Manca
View author publications
You can also search for this author in PubMed Google Scholar
Monica Musio
View author publications
You can also search for this author in PubMed Google Scholar
Walter Racugno
View author publications
You can also search for this author in PubMed Google Scholar
Laura Ventura
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Monica Musio.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Bertolino, F., Manca, M., Musio, M. et al. A new Bayesian discrepancy measure. Stat Methods Appl 33, 381–405 (2024). https://doi.org/10.1007/s10260-024-00745-1

Download citation

Accepted: 14 January 2024
Published: 23 February 2024
Issue Date: April 2024
DOI: https://doi.org/10.1007/s10260-024-00745-1

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A new Bayesian discrepancy measure

Abstract

Similar content being viewed by others

With Bayesian estimation one can get all that Bayes factors offer, and more

On the prevalence of information inconsistency in normal linear models

Introduction to Bayesian Statistical Inference

1 Introduction

2 The Bayesian discrepancy measure

2.1 The Bayesian discrepancy measure for a scalar parameter

Definition 1

Proposition 1

Proof

Definition 2

2.2 The Bayesian discrepancy measure in presence of nuisance parameters

3 Illustrative examples

3.1 Examples of the univariate parameter case

Example 1

Remark 1

Example 2

3.2 Examples of the more general case

3.2.1 Tests involving a single population

Example 3

Example 4

Example 5

3.2.2 Tests involving two independent populations

Example 6

Example 7

4 Comparison with the FBST

4.1 Similarities and differences between the procedures

4.1.1 Simulation study

4.1.2 Some examples

Example 8

Example 9

Example 10

5 Conclusions

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation