1 Introduction

D. V. Lindley in Lindley (1965) (preface page xi) stated that

“ ...hypothesis testing looms large in standard statistical practice, yet scarcely appears as such in the Bayesian literature.”

Since then things have changed and, in the last sixty years, there have been several attempts to build a measure of evidence that covers, in a Bayesian context, the role that the p-value has played in the frequentist setting. A prominent example is the decision test based on the Bayes Factor and its extensions (see, for instance, Berger (1985)).

As an alternative to the Bayes Factor, another Bayesian evidence measure is provided in Pereira and Stern (1999) upon which the testing procedure Full Bayesian Signicance Test (FBST) is based. For a recent survey on the FBST see Pereira and Stern (2020).

The main aim of this paper is to give a contribution to the testing procedure of precise hypotheses. In particular, the proposed Bayesian measure of evidence, called Bayesian Discrepancy Measure (BDM), gives an absolute evaluation of a hypothesis H in light of prior knowledge about the parameter and observed data. The proposed measure of evidence has the desired properties of invariance under reparametrization and consistency for large samples.

Our starting point is the idea that a hypothesis may be more or less supported by the available evidence contained in the posterior distribution.

We do not adopt the hypothesis testing approach for which there is no test that can lead to the rejection of a hypothesis except by comparing it with another hypothesis (Neyman-Pearson in the frequentist perspective, Bayes factor in the Bayesian one), but rather the approach proposed by Fisher (see Christensen (2005) and Deni (2004)). Reference is made to a precise hypothesis H and no alternative is considered against such hypothesis. In this view different hypotheses made by several experts can be evaluated and using the information coming from the same data, some can be accepted others not. In this respect, in a broad sense, we can say that we return to Fisher’s original idea of pure significance according to which “Every experiment may be said to exist only in order to give the facts a chance of disproving the null hypothesis” (Fisher 1925).

The proposed measure of evidence can be seen as a Bayesian tool for model checking, that is, as a technique that can aid in the actual specification of a model, without the need to make explicit reference to alternative models or hypotheses. For an extensive discussion of this point and the difference with the procedure of Bayesian model selection see O’Hagan (2003).

The structure of the paper is as follows. In Sect. 2 the definition of the proposed measure is presented for a scalar parameter of interest, both in the absence or presence of nuisance parameters. In Sect. 3 different illustrative examples are discussed, involving one or two independent populations. Finally, in Sect. 4 we make a comparison between the Bayesian Discrepancy Test and the Full Bayesian Significance Test which is based on the e-value, a well-known Bayesian evidence index used to test sharp hypotheses. The last section contains conclusions and directions for further research.

2 The Bayesian discrepancy measure

Let \((\mathcal {X}, \mathcal {P}^{X}_{\varvec{\theta }}, \varvec{\Theta })\) be a parametric statistical model where \(X \in \mathcal {X} \subset \mathbb {R}^k\), \(\mathcal {P}^{X}_{\varvec{\theta }}=\{f(x \vert \varvec{\theta })\ \vert \ \varvec{\theta } \in \varvec{\Theta }\}\) is a class of probability distributions (Lebesgue integrable) defined on \(\mathcal {X}\), depending on an unknown vector of continuous parameters \(\varvec{\theta } \in \varvec{\Theta }\), an open subset of \(\mathbb {R}^p\). Assume that

  1. (a)

    the model is identifiable;

  2. (b)

    \(f(x \vert \varvec{\theta })\) have support not depending on \(\varvec{\theta }\), \(\forall \ \varvec{\theta } \in \varvec{\Theta }\);

  3. (c)

    the log likelihood function must be at least twice differentiable;

  4. (d)

    the operations of integration and differentiation with respect to \(\varvec{\theta }\) can be exchanged.

We assume a prior probability density \(g_0(\varvec{\theta })\) following Cromwell’s Rule which states that “it is inadvisable to attach probabilites of zero to uncertain events, for if the prior probability is zero so is the posterior, whatever be the data. A probability of one is equally dangerous because then the probability of the complementary event will be zero” (see Section 6.2 in Lindley (1991)). We are then assuming that \(g_0(\varvec{\theta }) >0, \, \forall \varvec{\theta }\) for you will need that assumption for claiming consistency (see Proposition 1 of the next Section).

First, we discuss the case of a scalar parameter. Then we discuss the case of a scalar parameter of interest in the presence of nuisance parameters.

2.1 The Bayesian discrepancy measure for a scalar parameter

In this section we assume that \(k=p=1\). Given an iid random sample \(\varvec{x}=(x_1,\ldots ,x_n)\) from \(\mathcal {P}^{X}_{\theta }\), let \(L(\theta \vert \varvec{x})\) be the corresponding likelihood function based on data \(\varvec{x}\) and let \(g_0(\theta )\) be a continuous prior distribution on \(\Theta \subseteq \mathbb {R}\). The posterior probability density for \(\theta\) given \(\varvec{x}\) is then

$$\begin{aligned} g_1(\theta \vert \varvec{x}) \propto g_0(\theta ) \, L(\theta \vert \varvec{x}). \end{aligned}$$

Moreover, given the posterior distribution function \(G_1(\theta \vert \varvec{x})\), the posterior median is any real number \(m_1\) which satisfies the inequalities \(G_1(m_1\vert \varvec{x}) \ge \frac{1}{2}\) and \(G_1^-(m_1 \vert \varvec{x}) \le \frac{1}{2}\), where \(\displaystyle G_1^-(m_1 \vert \varvec{x})=\lim _{\theta \uparrow m_1} G_1(\theta \vert \varvec{x})\). In the case in which \(G_1(\cdot \vert \varvec{x})\) is continuous and strictly increasing we have \(m_1= G_1^{-1}(\frac{1}{2} \vert \varvec{x})\). Under the assumptions made in the beginning of Sect. 2, posterior median \(m_1\) is uniquely defined.

We are interested in testing the precise hypothesis

$$\begin{aligned} H: \theta = \theta _H. \end{aligned}$$
(1)

In order to measure the discrepancy of the hypothesis (1) w.r.t. the posterior distribution, in the case \(\Theta =\mathbb {R}\), we consider the following two intervals:

  1. 1.

    the discrepancy interval

    $$\begin{aligned} I_H = \left\{ \begin{array}{ll} (m_1,\theta _H) &{} \text { if } \quad m_1 < \theta _H \\ \{m_1\} &{} \text { if } \quad m_1 = \theta _H, \\ (\theta _H,m_1) &{} \text { if } \quad m_1 > \theta _H \\ \end{array} \right. \end{aligned}$$
    (2)
  2. 2.

    the external interval

    $$\begin{aligned} I_E = \left\{ \begin{array}{ll} (\theta _H,+\infty ) &{} \text { if } \quad m_1< \theta _H \\ (-\infty ,\theta _H) &{} \text { if } \quad \theta _H < m_1. \\ \end{array} \right. \end{aligned}$$
    (3)

When \(m_1=\theta _H\), the external interval \(I_E\) can be \((-\infty , m_1)\) or \((m_1,+\infty )\). Note that, by construction, \(\mathbb {P}(I_H \cup I_E)=\frac{1}{2}\) (see Fig. 1). If the support of the posterior is a subset of \(\mathbb {R}\), the intervals \(I_H\) and \(I_E\) can be defined consequently.

Fig. 1
figure 1

Posterior density \(g_1(\theta \vert \varvec{x})\), the corresponding discrepancy interval \(I_H\) and external interval \(I_E\) when \(\theta _H < m_1\) ([A]) and \(\theta _H > m_1\) ([B])

Definition 1

Given the posterior distribution function \(G_1(\theta \vert \varvec{x})\), we define the Bayesian Discrepancy Measure of the hypothesis H as

$$\begin{aligned} \delta _H = 2 \, \mathbb {P}(\theta \in I_H \vert \varvec{x}) = 2 \int _{I_H} d G_1(\theta \vert \varvec{x}). \end{aligned}$$
(4)

The measure can be also computed by means of the external interval as

$$\begin{aligned} \delta _H = 1 - 2 \, \mathbb {P}(\theta \in I_E \vert \varvec{x}) = 1 - 2 \int _{I_E} d G_1(\theta \vert \varvec{x}), \end{aligned}$$
(5)

which can also be written as

$$\begin{aligned} \delta _H = 1 - 2 \min \{G_1^{-}(\theta _H \vert \varvec{x}),1-G_1(\theta _H \vert \varvec{x})\}, \end{aligned}$$
(6)

where \(\displaystyle G_1^-(\theta _H \vert \varvec{x})=\lim _{\theta \uparrow \theta _H} G_1(\theta \vert \varvec{x})\). In our case, since \(G_1(\theta _H \vert \varvec{x})\) is continuous, this simplifies to

$$\begin{aligned} \delta _H = 1 - 2 \min \{G_1(\theta _H \vert \varvec{x}),1-G_1(\theta _H \vert \varvec{x})\}. \end{aligned}$$
(7)

Formulations (6) and (7) have the advantage of not involving the posterior median in the integral computation. Furthermore, one can interpret the quantity \(\min \{G_1(\theta _H \vert \varvec{x}),1-G_1(\theta _H \vert \varvec{x})\}\) as the posterior probability of a “tail" event concerning only the precise hypothesis H. Doubling this “tail" probability, related to the precise hypothesis H, one gets a posterior probability assessment about how “central" the hypothesis H is and hence how it is supported by the prior and the data.

It is important to highlight that the hypothesis H induces the following partition

$$\begin{aligned} \big \{ \Theta _a = (-\infty , \theta _H),\ \Theta _H = \{\theta _H\},\ \Theta _b = (\theta _H, \infty )\big \} \end{aligned}$$
(8)

of the parameter space \(\Theta\). Then formulations (6) and (7) can be equivalently expressed as

$$\begin{aligned} \delta _H = \; 1 - 2 \cdot \min _{a,b} \displaystyle \big \{ {\mathbb {P}}(\theta \in \Theta _a \vert \varvec{x}) \,, \, {\mathbb {P}}(\theta \in \Theta _b \vert \varvec{x}) \big \} \,. \end{aligned}$$
(9)

The last formula can be naturally extended to the case where, besides the scalar parameter of interest, nuisance parameters are also present. This issue will be developed in Sect. 2.2.

The following properties apply to the BDM, for a scalar parameter \(\theta\).

Proposition 1

  1. (i)

    \(\delta _H\) always exists and, by construction, \(\delta _H \in [0, 1]\);

  2. (ii)

    \(\delta _H\) is invariant under invertible monotonic transformations of the parameter \(\theta\);

  3. (iii)

    if \(\theta\) is an a.c. random variable, \(\theta ^*\) is the true value of the parameter and \(\theta ^*=\theta _H\), then \(\delta _{H}\) converges asymptotically to a \(Unif(\cdot \vert 0,1)\). Otherwise, if \(\theta ^* \ne \theta _H\), then \(\delta _{H} \, {\mathop {\rightarrow }\limits ^{\textit{p}}}\, 1\) (consistency property).

Proof

(i):

The first property follows immediately from the fact that in (4) the posterior probability \(\mathbb {P}(\theta \in I_H \vert \varvec{x}) \in \Big [0,\frac{1}{2}\Big ]\).

(ii):

Let \(\lambda =\lambda (\theta )\) be an invertible monotonic transformation of the parameter \(\theta\) and let \(K_1(\cdot )\) be the cumulative distribution function of the parameter \(\lambda\). We denote with \(\lambda _H=\lambda (\theta _H)\) and we notice that \(m'_1=\lambda (m_1)\) thanks to the monotonic invariance of the median. Suppose, for simplicity, that \(\theta _H>m_1\). Then

$$\begin{aligned} \delta _H=2\ \int _{m_1}^{\theta _H} dG_1(\theta \vert \varvec{x})\ = 2\ \Big \vert \int _{m'_1}^{\lambda _H} dK_1(\lambda \vert \varvec{x})\Big \vert . \end{aligned}$$

Therefore, the invariance of the BDM follows immediately from the invariance of the median under invertible monotonic transformations. Notice that if instead of the median \(m_1\) we consider, for example, the posterior mean \(E(\theta \vert \varvec{x} )\), which is not invariant under invertible monotonic reparametrizations, the property will not hold in general. Moreover, \(E(\theta \vert \varvec{x} )\) for some models may not even exist.

(iii):

We first examine the first part of the statement for which \(\theta ^*=\theta _H\). Let \(J(\hat{\theta })\) be the observed Fisher information and let \(\hat{\theta }\) be the maximum likelihood estimator of \(\theta\). Under suitable regularity and technical conditions (see for instance Section 7, p. 129 in Lindley (1965) and Section 5.3.2, p. 287 in Bernardo and Smith (1994)), the asymptotic distribution of the “normalized” random quantity \(W = \sqrt{J(\hat{\theta })}(\theta -\hat{\theta })\) is standard normal, both in the posterior, for fixed data and random \(\theta\), and in the sampling distribution, for fixed \(\theta\) and random data. We have

$$\begin{aligned} \delta _H = 1 - 2\min \{ G_1(\theta _H \vert \varvec{x}), 1- G_1(\theta _H \vert \varvec{x})\}, \end{aligned}$$
(10)

where

$$\begin{aligned} G_1(\theta _H \vert \varvec{x})= P(W \le \sqrt{J(\hat{\theta })} (\theta _H- \hat{\theta }) \mid \varvec{X}= \varvec{x}). \end{aligned}$$
(11)

Since W is asymptotically standard normal, then \(G_1(\theta _H \vert \varvec{x})\) is asymptotically \(\Phi \left( \sqrt{J(\hat{\theta }}) (\theta _H- \hat{\theta }) \right)\) (a function of the data through \(\hat{\theta }\)). But also, in the sampling distribution given \(\theta ^* = \theta _H\), \(\sqrt{J(\hat{\theta })} (\theta _H- \hat{\theta })\) is asymptotically standard normal and thus, in view of the probability integral transform, \(G_1(\theta _H \vert \varvec{X})\) is asymptotically uniform on [0, 1] in this sampling distribution. Then

$$\begin{aligned} \mathbb {P}(\delta _H \le t \vert \theta _H) = \mathbb {P} \left( \frac{1}{2}(1-t) \le G_1(\theta _H \vert \varvec{X}) \le \frac{1}{2} (1+t) \vert \theta _H \right) \approx t, \end{aligned}$$

so that \(\delta _H\) is asymptotically uniform under \(\theta _H\). If, instead, \(\theta ^* \ne \theta _H\) and \(n \rightarrow \infty\), under suitable regularity conditions (see for instance Section 7, p. 129 in Lindley (1965)) it is well known that \(g_1(\theta \vert \varvec{x})\) is concentrated in a neighbourhood whose size is of order \(n^{-\frac{1}{2}}\) around \(\theta ^*\). Then from equation 5, since the tail event \(\theta \in I_E\) will have vanishingly small probability, we have that \(\lim _{n \rightarrow \infty } \delta _H=1\).

\(\square\)

As pointed out before, the further \(\theta _H\) is from the posterior median \(m_1\) of the distribution function \(G_1(\theta \vert \varvec{x})\), the closer \(\delta _H\) is to 1. It can then be said that H does not conform to \(G_1(\theta \vert \varvec{x})\). On the contrary, the smaller \(\delta _H\) the stronger is the evidence in favor of H. Following this idea, we can construct a procedure to evaluate (and possibly reject) the hypothesis H, using the evidence measure \(\delta _H\).

Definition 2

The Bayesian Discrepancy Test (BDT) is the procedure for evaluating a hypothesis H based on the Bayesian Discrepancy Measure (BDM).

High values of \(\delta _H\) provide strong evidence against the hypothesis H. On the other hand, if \(\delta _H\) is small, the data are consistent with H.

Summarizing, when H is true, then, for large n, \(\delta _H\) is roughly equally likely to fall anywhere between 0 and 1. By contrast, when H is false, \(\delta _H\) is more likely to be near 1 than near 0. As for other measures of evidence (as for the Full Bayesian Significance Test or the frequentist p-value), a threshold could be chosen in order to interpret the observed value of \(\delta _H\). However, in the direction recommended in the ASA statement (see Wasserstein and Lazar (2016)) and in view of the debate on hypothesis testing (Benjamin et al. 2018; Benjamin and Berger 2019) and the recent studies about the reproducibility of experiments (Collaboration 2015; Johnson et al. 2017), we agree with Fisher (1973) that “no scientific worker has a fixed level of significance at which from year to year, and in all circumstances, he rejects hypotheses; he rather gives his mind to each particular case in the light of his evidence and his ideas". Given the critical points related to the choice of a threshold, we think it is important to look for an applied measure of evidence that pushes the researcher to think more about the specific problem, and that avoids the use of standard receipes.

2.2 The Bayesian discrepancy measure in presence of nuisance parameters

Suppose that \(p \ge 2\) and \(k \ge 1\). Let \(\varphi = \varphi (\varvec{\theta })\) be a scalar parameter of interest, where \(\varphi : \varvec{\Theta } \rightarrow \Phi \subseteq \mathbb {R}\). Let us further consider a bijective reparametrization \(\varvec{\theta } \Leftrightarrow (\varphi , \varvec{\zeta })\), where \(\varvec{\zeta } \in \varvec{Z} \subseteq \mathbb {R}^{p-1}\) denotes an arbitrary nuisance parameter, which is determined on the basis of analytical convenience (note that the value of the evidence measure is invariant with respect to the choice of the nuisance parameter). We consider hypotheses that can be expressed in the form

$$\begin{aligned} H: \varphi = \varphi _H, \end{aligned}$$
(12)

where \(\varphi _H\) is known as it represents the hypothesis that it is of interest to evaluate. The transformation \(\varphi\) must be such that, for all \(\varvec{\theta } \in \varvec{\Theta }\) and for all \(\varphi _H \in \Phi\), it can always be assessed whether \(\varphi\) is strictly smaller, strictly larger or equal to \(\varphi _H\) (i.e. \(\varphi < \varphi _H\) either \(\varphi > \varphi _H\), or \(\varphi = \varphi _H\)). Hypothesis (12) and transformation \(\varphi\), with

$$\begin{aligned} \begin{array}{ll} {\varvec{\Theta }}_a &{}= \big \{ \varvec{\theta } \in \varvec{\Theta }: \varphi < \varphi _H \big \}\\ {\varvec{\Theta }}_H &{}= \big \{ \varvec{\theta } \in \varvec{\Theta }: \varphi = \varphi _H \big \}.\\ {\varvec{\Theta }}_b &{}= \big \{ \varvec{\theta } \in \varvec{\Theta }: \varphi > \varphi _H \big \} \end{array} \end{aligned}$$
(13)

We call any hypothesis of type (12), which identify a partition of the form (13), a partitioning hypothesis. It is easy to verify that many commonly used hypotheses are partitioning. In this paper we only consider hypotheses of this nature. In this setting, we express the BDM as

$$\begin{aligned} \begin{array}{ll} \delta _H &{}=\; \displaystyle 1 - 2 \cdot \min _{a,b} \big \{ {\mathbb {P}}(\varvec{\theta } \in \varvec{\Theta }_a \vert \varvec{x}) \,, \, {\mathbb {P}}(\varvec{\theta } \in \varvec{\Theta }_b \vert \varvec{x}) \big \} \\ &{}=\; \displaystyle 1 - 2 \cdot \int _{I_E} g_1(\varvec{\theta } \vert \varvec{x}) \, \textrm{d}\varvec{\theta }, \, \end{array} \end{aligned}$$
(14)

where the external set is given by

$$\begin{aligned} I_E \; = \; \arg \min _{a,b} \big \{ {\mathbb {P}}(\varvec{\theta } \in \varvec{\Theta }_a \vert \varvec{x}) \,, \, {\mathbb {P}}(\varvec{\theta } \in \varvec{\Theta }_b \vert \varvec{x}) \big \} \;. \; \end{aligned}$$
(15)

In the particular scenario where the marginal posterior

$$\begin{aligned} h_1(\varphi \vert \varvec{x}) \; = \; \int _{\varphi ( \varvec{\theta })= \varphi } g_1(\varvec{\theta } \vert \varvec{x}) \textrm{d}\varvec{\theta } \;, \quad \forall \varphi \in \Phi \,, \end{aligned}$$

of the parameter of interest \(\varphi\) can be computed in a closed form, the hypothesis (12) can be easily treated using the methodologies seen in Subsection 2.1, i.e. the BDM is computed by means of formula (4) or (5) applied to the marginal.

Properties reported in Proposition 1 naturally extend to the setting we just presented.

3 Illustrative examples

The simplicity of the BDT is highlighted by the following examples, some of which deal with cases not usually considered in the literature. Examples 1 and 2 focus on a scalar parameter of interest, while Examples 3, 4, 5, 6, 7 also contain nuisance parameters.

In all examples we have adopted a Jeffreys’ prior (see Yang and Berger (1996) for a catalog of non-informative priors) for simplicity. However, other objective priors and, in the presence of substantive prior information, informative priors could equally be used.

3.1 Examples of the univariate parameter case

Example 1

Exponential distribution Let \(\varvec{x}=(x_1, \dots , x_n)\) be an iid sample of size n from the Exponential distribution \(X \sim Exp\big (x \vert \theta ^{-1} \big )\), with \(\theta \in {\mathbb {R}}^+.\) We are interested in the hypothesis \(H: \theta =\theta _H\). Assuming a Jeffreys’ prior for \(\theta\), i.e. \(g_0(\theta ) \propto \theta ^{-1}\), the posterior distribution is given by \(g_1(\theta \vert \varvec{x}) \propto \theta ^{-n-1} \exp \{- n \bar{x} \cdot \theta ^{-1} \}\), with \(\bar{x}\) the sample mean.

Figure 2 shows the posterior density function as well as the discrepancy and the external intervals for \(H:\theta = \theta _H = 2.4\) and the MLE \(\bar{x} = 1.2\) for three sample sizes [A] \(n =6\), [B] \(n = 12\), [C] \(n = 24\). In [A] we have a posterior median \(m_1=1.27\) and \(\delta _H =0.832\), while in [B] \(m_1=1.23\) and \(\delta _H =0.960\), in [C] \(m_1=1.22\) and \(\delta _H =0.997\).

While in case [A] the data do not contradict H sufficiently, in case [B] there is a weak evidence against H, which becomes stringer in [C].

Fig. 2
figure 2

Posterior density function \(g_1(\theta \vert n\bar{x})\) and intervals \(I_H = (m_1,\theta _H)\) and \(I_E = (\theta _H, \infty )\), using data from Example 1

Note that in all scenarios considered, we find the following relation between \(\delta _H\) and the p-value,

$$\begin{aligned} {\textit{p-value}}=1 - \delta _H \end{aligned}$$
(16)

(in [A] \(\delta _H = 0.832\) and p-value\(= 0.168\), in [B] \(\delta _H = 0.96\) and p-value\(= 0.04\), while in [C] \(\delta _H = 0.997\) and p-value\(= 0.003\)). This result depends clearly on the use of the Jeffreys’ prior, which is a matching prior for a scalar parameter (see Ruli and Ventura (2021)).

Remark 1

The fact that classical and Bayesian procedures, under certain conditions, produce the same conclusions is well known (see, for instance, Lindley (1965)). The linear relationship (16) also occurs in other simple cases.

Finally, to conclude Example 1, it is useful to show the trend of the BDM when varying \(n=1,2,\dots ,25\) for six values of the MLE: \((a)\ 0.8\), \((b)\ 1.2\), \((c)\ 1.6\) (case [A]) and \((d)\ 4.0\), \((e)\ 3.6\), \((f)\ 3.2\) (case [B]), see Fig. 3. In order to explain the difference between the BDM trends in cases [A] and [B], consider that:

  1. (i)

    in case [A] the posterior median \(m_1 < \theta _H = 2.4\), whereas in case [B] \(m_1 > \theta _H = 2.4\);

  2. (ii)

    \(\delta _H\) is monotonically increasing, both with respect to n, and with respect to the distance \(\vert m_1 - \theta _H \vert\);

  3. (iii)

    the posterior \(g_1\) always has a positive asymmetry, which decreases as n increases;

  4. (iv)

    the trend difference of the BDM in cases [A] and [B] depends on the fact that the posterior \(g_1\) has ‘small’ tails on the left-hand side of \(m_1\) and ‘large’ tails on the right-hand side.

Fig. 3
figure 3

BDM for n increasing and for different values of the MLE. Case [A] with MLE \(= 0.8\ ({\textbf {a}}),\ 1.2\ ({\textbf {b}}),\ 1.6\ ({\textbf {c}})\) and case [B] with MLE \(= 3.2\ ({\textbf {f}}),\ 3.6\ ({\textbf {e}}),\ 4\ ({\textbf {d}})\)

Moving forward in the discussion, in order to highlight the evaluative nature of the BDT, it is worth pointing out that it allows the separate and simultaneous testing of \(\ell \ge 2\) hypotheses

$$\begin{aligned} H_j: \, \varphi = \varphi _j, \quad j=1,2,\dots , \ell , \end{aligned}$$
(17)

as shown in Example 2. Remember that with the comparative approach, among the \(\ell\) competing hypotheses, only one is accepted. On the contrary, under the evaluative approach, it may happen that several hypotheses are supported by the data, or even that all hypotheses must be rejected.

Example 2

- Evaluation of some hypotheses made by several experts (Bernoulli distribution) In the 1700 s, several hypotheses \(H_j: \theta = \theta _j\) were formulated about the birth masculinity rate \(\theta =\frac{M}{M+F}\). Among them we consider \(\theta _1 = \frac{1}{2}\) (J. Bernoulli), \(\theta _2 = \frac{13}{25}\) (J. Arbuthnot), \(\theta _3 = \frac{1050}{2050}\) (J. P. Süssmilch), \(\theta _4 = \frac{23}{45}\) (P. S. Laplace). We assume that the gender of each newborn is modeled as a \(Bin(\cdot \vert 1, \theta )\). Then, using data recorded in 1710 in London (see, for instance, Spiegelhalter (2019)), with 7640 males and 7288 females (the MLE is \(\hat{\theta } = 0.512\)) and assuming the Jeffreys’ prior \(Beta(\theta \vert 1, 1)\), we compute \(\delta _{H_j}\) using the Normal asymptotic approximation

$$\begin{aligned} \delta _{H_j} \cong 1 - 2\cdot \displaystyle \int _{I_E^j} \tilde{g}_1\big (\theta \vert \hat{\theta }, \textstyle {\frac{1}{n}}\hat{\theta }(1-\hat{\theta }) \big ) \textrm{d}\theta , \quad j=1,2,3,4, \end{aligned}$$

with \(\tilde{g}_1\) the Normal distribution. Since \(\delta _{H_1}=0.996\), \(\delta _{H_2}=0.955\), \(\delta _{H_3}=0.079\), \(\delta _{H_4}=0.132\), we can conclude that there is sufficient evidence against the first two hypotheses, while there is not enough evidence agains the hypotheses made by Süssmilch and Laplace.

3.2 Examples of the more general case

The examples presented hereafter, can be distinguished by tests concerning a parameter or a parametric function of a single population, and tests concerning the comparison of two independent population parameters.

3.2.1 Tests involving a single population

Example 3

- Test on the shape parameter, mean and variance of the Gamma distribution Let \({\varvec{x}} = (x_1,\dots ,x_n)\) be an iid sample of size n from \(X \sim Gamma\big ( x \vert \alpha , \beta \big )\), \((\alpha , \beta ) \in {\mathbb {R}}^+ \times {\mathbb {R}}^+\). We denote by \(m_g\) the geometric mean of \(\varvec{x}\). The likelihood function for \((\alpha ,\beta )\) is given by

$$\begin{aligned}L(\alpha , \beta \vert \varvec{x}) \; \propto \; \left( \displaystyle {\frac{\beta ^\alpha }{\Gamma (\alpha )}} \cdot m_g^\alpha \cdot e^{- \bar{x}\cdot \beta } \right) ^n . \end{aligned}$$

For the fictitious data \({\varvec{x}} = (0.8, 1.1, 1.2, 1.4, 1.8, 2, 4, 5, 8)\), we find that the MLEs are \(\hat{\alpha } = 1.921\) and \(\hat{\beta } = 0.7572\).

We are interested in testing the hypotheses [A] \(H_A: \alpha = \alpha _H\), with \(\alpha _H = 2.5\), [B] \(H_B: \mu = \mu _H\), with \(\mu _H = 6\), and [C] \(H_C: \sigma ^2 = \sigma ^2_H\), with \(\sigma ^2_H = 2\), where \(\mu = \displaystyle {\frac{\alpha }{\beta }}\) and \(\sigma ^2 = \displaystyle {\frac{\alpha }{\beta ^2}}\) denote the mean and the variance of X.

We suppose that the parameters \(\alpha \text { and } \beta\) are independent and we assume the Jeffreys’ prior for them (see Yang and Berger (1996)), i.e. \(g_0 ( \alpha , \beta ) \; = \; g_0^\alpha ( \alpha ) \cdot g_0^\beta ( \beta )\) where \(g_0^\alpha ( \alpha )\propto \sqrt{\alpha \cdot \psi ^{(1)} (\alpha ) - 1}\), \(g_0^\beta ( \beta ) \propto {\frac{1}{\beta }}\), and \(\psi ^{(1)} (\alpha ) = \sum _{j=0}^\infty (\alpha + j)^{-2}\) denotes the digamma function. Then, the posterior for \((\alpha , \beta )\) is given by \(g_1(\alpha , \beta \mid {\varvec{x}}) \; = \; k\cdot g_0^\alpha ( \alpha ) \cdot g_0^\beta ( \beta ) \cdot L(\alpha , \beta \vert \varvec{x}),\) with normalizing constant k.

  • Case [A] The hypothesis \(H_A\) identifies the vertical straight line of equation \(\alpha = \alpha _H\) and two subsets \({\varvec{\Theta }}_a = \{ (\alpha , \beta ): \alpha < \alpha _H \}\) and \({\varvec{\Theta }}_b = \{ (\alpha , \beta ): \alpha > \alpha _H \}\) (see Fig. 4 [A]). Then we can compute

    $$\begin{aligned} {\mathbb {P}} \big ( (\alpha , \beta ) \in {\varvec{\Theta }}_b\ \vert \ \varvec{x}\big )= & {} \displaystyle \int _{\alpha _H}^\infty \int _0^\infty g_1(\alpha , \beta \mid {\varvec{x}}) \, \textrm{d} \beta \, \textrm{d}\alpha \\= & {} \displaystyle k \cdot \int _{\alpha _H}^\infty \int _0^\infty \sqrt{\alpha \cdot \psi ^{(1)} (\alpha ) - 1} \cdot {\frac{1}{\beta }} \left( \displaystyle {\frac{\beta ^\alpha }{\Gamma (\alpha )}} \cdot m_g^\alpha \cdot e^{- \bar{x}\cdot \beta } \right) ^n \, \textrm{d} \beta \, \textrm{d}\alpha \\= & {} \displaystyle \displaystyle k \cdot \int _{\alpha _H}^\infty \sqrt{\alpha \cdot \psi ^{(1)} (\alpha ) - 1} \cdot {\frac{\Gamma (n\alpha )}{\Gamma (\alpha )^n}} \cdot \left( {\frac{m_g}{n\, \bar{x}}} \right) ^{n\alpha } \, \textrm{d} \alpha \; = \; 0.215 \,, \end{aligned}$$

    and \(\delta _H = 0.570\), indicating that there is not enough evidence against H.

  • Case [B] The hypothesis \(H_B\) identifies the straight line of equation \(\beta = {\frac{1}{\mu _{H}}} \alpha\) in the \(\alpha \beta\)-plane (see Fig. 4 [B]) and the two subsets

    $$\begin{aligned} {\varvec{\Theta }}_c = \big \{ (\alpha , \beta ): \beta > {\frac{1}{\mu _{H}}} \alpha \big \} \, and \, {\varvec{\Theta }}_d= \big \{ (\alpha , \beta ): \beta < {\frac{1}{\mu _{H}}} \alpha \big \}. \end{aligned}$$

    We have

    $$\begin{aligned} {\mathbb {P}} \big ( (\alpha ,\beta ) \in {\varvec{\Theta }}_d\ \vert \ \varvec{x} \big ) \; = \; \displaystyle \int _{{\varvec{\Theta }}_d} g_1(\alpha , \beta \mid {\varvec{x}}) \, \textrm{d}\alpha \, \textrm{d} \beta \; = \; 0.012 \,, \end{aligned}$$

    and, since \(\delta _H = 0.976\), we have strong evidence against \(H_B\).

  • Case [C] The hypothesis \(H_C\) identifies the parabola of equation \(\beta = {\frac{1}{\sqrt{\sigma ^2_{H}}}} \sqrt{\alpha },\) in the \(\alpha \beta\)-plane (see Fig. 4 [C]), and the two subsets

    $$\begin{aligned} {\varvec{\Theta }}_e = \big \{ (\alpha ,\beta ): \beta > {\frac{1}{\sqrt{\sigma ^2_{H}}}} \sqrt{\alpha } \big \} \, and \, {\varvec{\Theta }}_f= \big \{ (\alpha , \beta ): \beta < {\frac{1}{\sqrt{\sigma ^2_{H}}}} \sqrt{\alpha } \big \}. \end{aligned}$$

    We have

    $$\begin{aligned} {\mathbb {P}} \big ( (\alpha , \beta ) \in {\varvec{\Theta }}_e \ \vert \ \varvec{x} \big ) \; = \; \displaystyle \int _{{\varvec{\Theta }}_e} g_1(\alpha , \beta \mid {\varvec{x}}) \, \textrm{d}\alpha \, \textrm{d} \beta \; = \; 0.078 \,. \end{aligned}$$

    Therefore \(\delta _H = 0.846\), and so we have not strong evidence against \(H_C\).

Fig. 4
figure 4

Posterior density function \(g_1(\alpha , \beta \vert {\varvec{x}})\) from Example 3 and corresponding sets of the induced partition in the cases [A], [B] and [C]

Example 4

- Test on the coefficient of variation for a Normal distribution Given an iid sample \({\varvec{x}} = (x_1,\dots ,x_n)\) from \(X \sim N \big ( x \vert \mu , \phi ^{-1} \big )\), the parameter of interest is \(\psi = \displaystyle {\frac{\sqrt{Var(X)}}{\mid \mathbb {E}(X) \mid }} =\displaystyle {\frac{1}{\mid \mu \mid \sqrt{\phi }} }\). We are interested in testing the hypothesis

$$\begin{aligned} H: \psi = \psi _H, \end{aligned}$$

with \(\psi _H = 0.1\). If we consider the Jeffreys’ prior \(g_0(\mu ,\phi ) \propto \phi ^{-1} \cdot {\textbf{1}}_{\mathbb {R}\times \mathbb {R}^+},\) the posterior distribution is the Normal-Gamma density

$$\begin{aligned} (\mu ,\phi ) \mid \varvec{x} \; \sim \; NG \big ( \mu , \phi \mid \eta , \nu , \alpha , \beta \big ) \,,\, \end{aligned}$$

with hyperparameters \(( \eta , \nu , \alpha , \beta )\), where \(\eta = \bar{x}\), \(\nu = n\), \(\alpha = {\frac{1}{2}}(n-1)\), \(\beta = {\frac{1}{2}}n s^2\), and density

$$\begin{aligned} g_1( \mu , \phi \mid \eta , \nu , \alpha , \beta )=\frac{\beta ^{\alpha }\sqrt{\nu } }{\Gamma (\alpha )\sqrt{2 \pi }} \phi ^{\alpha -1/2} e^{-\frac{\nu \phi }{2} (\mu - \eta )^2}e^{-\beta \phi }. \end{aligned}$$

We consider the particular case in which \(\bar{x} = 17\) and \(s^2 = 1.6\) (so that the MLE is \(\hat{\phi } = 0.074\)) with two samples of size \(n= 10\) (Fig. 5 [A]) and \(n= 40\) (Fig. 5 [B]). In the \(\mu \phi -\)space, the hypothesis H is represented by the curve \(\phi = \displaystyle {\frac{1}{\psi _H^2 }\mu ^{-2}}\) and determines the subsets \({\varvec{\Theta }}_a\) and \({\varvec{\Theta }}_b\) visualized in Fig. 5.

Fig. 5
figure 5

Test on the coefficient of variation \(\psi\) of a Gaussian population. Data refers to Example 4. In the plots, the sets \({\varvec{\Theta }}_a\), \({\varvec{\Theta }}_b\) and \({\varvec{\Theta }}_H\) are reported for \(n=10\) ([A]) and \(n=40\) ([B])

In case [A] we have

$$\begin{aligned} {\mathbb {P}} \big ( (\mu , \phi ) \in {\varvec{\Theta }}_b\ \vert \ \varvec{x} \big )= \int _{{\varvec{\Theta }}_b} g_1( \mu , \phi \mid \eta , \nu , \alpha , \beta ) \, \textrm{d}\mu \, \textrm{d} \phi = 0.215, \end{aligned}$$

where \(g_1( \mu , \phi \mid \eta , \nu , \alpha , \beta )\) is the Normal-Gamma density, so that \(\delta _H = 0.570\) and there is not enough evidence against H. In case [B], we have \({\mathbb {P}} \big ( (\mu , \phi ) \in {\varvec{\Theta }}_b \ \vert \ \varvec{x} \big ) = 0.014\) and, since \(\delta _H = 0.972\), there is strong evidence against H. Therefore in such a case, with different sample sizes, the inferential conclusions change (Fig. 6).

Example 5

- Test on the skewness coefficient of the Inverse Gaussian distribution Let us consider a Inverse Gaussian random variable X with density

$$\begin{aligned} f(x \mid \mu , \nu )=\sqrt{\frac{\nu }{2\pi x^3}} \exp {\Big \{ -\frac{\nu }{2} \Big (\frac{x-\mu }{\mu \sqrt{x}}\Big )^2\Big \}} \cdot {\textbf {1}}_{\mathbb {R}^+}(x), \end{aligned}$$

where \((\mu ,\nu ) \in \mathbb {R}^+ \times \mathbb {R}^+\). The parameter of interest is the skewness coefficient \(\gamma = 3 \sqrt{\frac{\mu }{\nu }}\) and it is of interest to test the hypothesis \(H: \gamma =\gamma _H\), where \(\gamma _H = 2\). The Jeffreys’ prior is

$$\begin{aligned} g_0(\mu ,\nu ) \propto \frac{1}{\sqrt{\mu ^3 \nu }} \cdot {\textbf {1}}_{\mathbb {R}^+\times \mathbb {R}^+}(\mu ,\nu ). \end{aligned}$$

Given n observations, the posterior distribution of \((\mu , \nu )\) is

$$\begin{aligned} g_1(\mu , \nu \vert \varvec{x}) \; \propto \; \sqrt{\frac{\nu ^{n-1}}{\mu ^3}} \cdot \exp { \left\{ - \frac{n \ \nu }{2}\cdot \left( \frac{\bar{x}}{\mu ^2} - \frac{2}{\mu } + \frac{1}{a} \right) \right\} } \cdot {\textbf {1}}_{\mathbb {R}^+\times \mathbb {R}^+}(\mu ,\nu ), \end{aligned}$$

where \(\bar{x}\) and a are the arithmetic and harmonic mean, respectively.

We apply the procedure to the following rainfall data (inches) analyzed in Folks and Chhikara (1978) (p. 272):

$$\begin{aligned} \begin{array}{ccccccc} 1.01 &{}\quad 1.11 &{}\quad 1.13 &{}\quad 1.15 &{}\quad 1.16\\ 1.17 &{}\quad 1.17 &{}\quad 1.20 &{}\quad 1.52 &{}\quad 1.54\\ 1.54 &{}\quad 1.57 &{}\quad 1.64 &{}\quad 1.73 &{}\quad 1.79\\ 2.09 &{}\quad 2.09 &{}\quad 2.57 &{}\quad 2.75 &{}\quad 2.93\\ 3.19 &{}\quad 3.54 &{}\quad 3.57 &{}\quad 5.11 &{}\quad 5.62. \end{array} \end{aligned}$$

The hypothesis identifies in the parameter space \(\varvec{\Theta } = \mathbb {R}^+ \times \mathbb {R}^+\) the subsets

$$\begin{aligned} \begin{array}{ll} \varvec{\Theta }_a \;=&{} \Big \{(\mu ,\nu )\in \varvec{\Theta }: 3\sqrt{\frac{\mu }{\nu }} < \gamma _H \Big \} \,, \\ \varvec{\Theta }_H \;=&{} \Big \{(\mu ,\nu )\in \varvec{\Theta }: 3 \sqrt{\frac{\mu }{\nu }} = \gamma _H \Big \} \,, \\ \varvec{\Theta }_b \;=&{} \Big \{(\mu ,\nu )\in \varvec{\Theta }: 3\sqrt{\frac{\mu }{\nu }} > \gamma _H \Big \} \,. \end{array} \end{aligned}$$

We have that

$$\begin{aligned} \mathbb {P}\big ((\mu ,\nu ) \in \varvec{\Theta }_b \ \vert \ \varvec{x} \big ) \; = \; \int _{\varvec{\Theta }_b} g_1(\mu , \nu \vert \varvec{x}) \ \textrm{d}\mu \; \textrm{d}\nu = 0.078\,, \end{aligned}$$
(18)

see Fig. 6, then we obtain \(\delta _H = 0.844\). This result indicates that we do not have enough evidence against the hypothesis H.

Fig. 6
figure 6

Test on the skewness of the Inverse Gaussian distribution with \(\gamma _H = 2\). In the plot the sets of the partition induced by H are reported. Data refers to Example 5

3.2.2 Tests involving two independent populations

In this section we consider some examples concerning comparisons between parameters of two independent populations.

Example 6

- Comparison between means and precisions of two independent Normal populations Let us consider a case study on the dating of the core and periphery of some wooden furniture, found in a Byzantine church, using radiocarbon (see Casella and Berger (2001), p. 409). The historians wanted to verify if the mean age of the core is the same as the mean age of the periphery, using two samples of sizes \(m =14\) and \(n = 9\), respectively, given by

$$\begin{array}{*{20}{c}} {core}&{\quad 1294}&{\quad 1279}&{\quad 1274}&{\quad 1264}&{\quad 1263}&{\quad periphery}&{\quad 1284}&{\quad 1272}&{\quad 1256}\\ {}&{\quad 1254}&{\quad 1251}&{\quad 1251}&{\quad 1248}&{\quad 1240}&\quad &{\quad 1254}&{\quad 1242}&{\quad 1274}\\ {}&{\quad 1232}&{\quad 1220}&{\quad 1218}&{\quad 1210}&\quad &{}&{\quad 1264}&{\quad 1256}&{\quad 1250} \end{array}$$

We assume that the age of the core X and of the periphery Y are distributed as

$$\begin{aligned} X \sim N(x \vert \mu _1,\phi _1^{-1}) \quad \textrm{and} \quad Y \sim N(y\vert \mu _2,\phi _2^{-1}), \end{aligned}$$

where \(Var(X) = \phi _1^{-1}\) and \(Var(Y) = \phi _2^{-1}\), and we assume that the data are iid conditional on the parameters. We consider for \((\mu _i,\phi _i)\) the Jeffreys’ prior

$$\begin{aligned} g_0^i(\mu _i,\phi _i) \propto \phi _i^{-1} \cdot {\textbf{1}}_{\mathbb {R}\times \mathbb {R}^+} \;, \; i=1,2 \,. \end{aligned}$$

We obtain \(\bar{x} = 1249.86\), \(\bar{y} = 1261.33\), \(\bar{d} = \bar{x} - \bar{y} = -11.48\), while the MLEs for the sample standard deviations are \(s_1 = 23.43\) and \(s_2 = 12.51.\) The posterior distribution for \((\mu _i,\phi _i)\) is the Normal-Gamma law

$$\begin{aligned} (\mu _i,\phi _i) \mid \varvec{x}, \varvec{y} \; \sim \; NG \big ( \mu _i, \phi _i \mid \eta _i, \nu _i, \alpha _i, \beta _i \big ) \,, \; i=1,2, \, \end{aligned}$$

with hyperparameters \(\eta _1 = \bar{x}\), \(\nu _1 = m\), \(\alpha _1 = {\frac{1}{2}}(m-1)\), \(\beta _1 = {\frac{1}{2}}m s_1^2\)   and   \(\eta _2 = \bar{y}\), \(\nu _2 = n\), \(\alpha _2 = {\frac{1}{2}}(n-1)\), \(\beta _2 = {\frac{1}{2}}n s_2^2\), and density

$$\begin{aligned} g^i_1( \mu _i, \phi _i \mid \eta _i, \nu _i, \alpha _i, \beta _i)=\frac{\beta _i^{\alpha _i}\sqrt{\nu _i} }{\Gamma (\alpha _i)\sqrt{2 \pi }} \phi _i^{\alpha _i-1/2} e^{-\frac{\nu _i \phi _i}{2} (\mu _i- \eta _i)^2}e^{-\beta _i\phi _i}, \; i=1,2. \end{aligned}$$

The hypothesis of interest

$$\begin{aligned} H_A: \mu _1 -\mu _2 = 0, \quad \forall \phi _1> 0, \quad \forall \phi _2 > 0, \end{aligned}$$

identifies the following subsets in the parameter space

$$\begin{aligned} \begin{array}{ll} {\varvec{\Theta }}_a &{}= \ \Big \{ \, {\mathbb {R}}^2 \times {\mathbb {R}}_+^2: \mu _1 < \mu _2 \; \Big \},\\ {\varvec{\Theta }}_{H_A} &{}= \ \Big \{ \, {\mathbb {R}}^2 \times {\mathbb {R}}_+^2: \mu _1 = \mu _2 \; \Big \},\\ {\varvec{\Theta }}_b &{}= \ \Big \{ \, {\mathbb {R}}^2 \times {\mathbb {R}}_+^2: \mu _1 > \mu _2 \; \Big \}. \end{array} \end{aligned}$$

Then we can compute

$$\begin{aligned} {\mathbb {P}}\big ( (\mu _1, \mu _2, \phi _1, \phi _2) \in {\varvec{\Theta }}_a \ \vert \ \varvec{x}, \varvec{y}\big ) \hspace{8cm} \\ \begin{array}{ll} \hspace{0.5 cm} &{}=\; \displaystyle \int _{{\varvec{\Theta }}_a} \prod _{i=1}^{2} g^i_1( \mu _i, \phi _i \mid \eta _i, \nu _i, \alpha _i, \beta _i) \, \textrm{d}\mu _1 \, \textrm{d}\mu _2 \, \textrm{d} \phi _1 \, \textrm{d} \phi _2 \\ &{}=\; \displaystyle \int _{\mu _1< \mu _2} \prod _{i=1}^{2} \frac{\Gamma (\alpha _i+\frac{1}{2})}{\Gamma (\alpha _i)}\Big (\frac{\nu _i}{2 \pi \beta _i}\Big )^{1/2} \Big [1+\frac{\nu _i}{2 \beta _i}(\mu _i-\eta _i)^2\Big ]^{-(\alpha _i+\frac{1}{2})} \, \textrm{d}\mu _1 \, \textrm{d}\mu _2 \\ &{}=\; \displaystyle 0.089 \,, \end{array} \end{aligned}$$

so we have \(\delta _H = 0.823\), a value that does not indicate evidence against the hypothesis. We exploited the fact that the marginal of each \(\mu _i\) is a Generalized Student’s t-distribution (denoted by StudentG) with hyperparameters \(\big ( \eta _i, \frac{ \nu _i\alpha _i}{\beta _i}, 2 \alpha _i \big )\).

Fig. 7
figure 7

Comparisons between means ([A]) and precisions ([B]) of independent normal populations for data in Example 6. For both cases we show the contour plots of the marginals of \(\mu _j\) ([A]) and \(\phi _j\) ([B]), and the partition sets associated with the corresponding hypotheses

Figure 7 [A] in the space \((\mu _1, \mu _2)\) shows the contour lines of the distribution

$$\begin{aligned} StudentG \Big ( \mu _1 \big \vert \, \eta _1, {\frac{\nu _1 \cdot \alpha _1}{\beta _1}}, 2\alpha _1 \Big ) \cdot StudentG \Big ( \mu _2 \big \vert \, \eta _2, {\frac{\nu _2 \cdot \alpha _2}{\beta _2}}, 2\alpha _2 \Big ). \end{aligned}$$

Note that the homoscedasticity assumption is not necessary. Consider now the hypothesis

$$\begin{aligned} H_B: \phi _1 - \phi _2 =0, \quad \forall \mu _1, \mu _2, \end{aligned}$$

which determines in the parameter space the subsets

$$\begin{aligned} \begin{array}{ll} {\varvec{\Theta }}_c &{}= \; \Big \{ \, {\mathbb {R}}^2 \times {\mathbb {R}}_+^2: \phi _1 < \phi _2 \; \Big \}, \\ {\varvec{\Theta }}_{H_B} &{}= \; \Big \{ \, {\mathbb {R}}^2 \times {\mathbb {R}}_+^2: \phi _1 = \phi _2 \; \Big \}, \\ {\varvec{\Theta }}_d &{}= \; \Big \{ \, {\mathbb {R}}^2 \times {\mathbb {R}}_+^2: \phi _1 > \phi _2 \; \Big \}. \end{array} \end{aligned}$$

We have

$$\begin{aligned} {\mathbb {P}} \big ( (\mu _1, \mu _2, \phi _1, \phi _2) \in {\varvec{\Theta }}_c \ \vert \ \varvec{x}, \varvec{y} \big ) = \int _{\phi _1< \phi _2} \prod _{i=1}^{2} \frac{\beta _i^{\alpha _i}}{\Gamma (\alpha _i)}\phi _i^{\alpha _i-1}e^{-\phi _i\beta _i} \, \textrm{d}\phi _1 \, \textrm{d}\phi _2 = 0.046, \end{aligned}$$

from which it follows that \(\delta _H = 0.908\) and there is strong evidence against the hypothesis H. To compute the integral we have used the fact that the marginal of each \(\phi _i\) has Gamma distribution with parameters \((\alpha _i, \beta _i), \ i=1,2\).

The contour lines of the law \(Gamma ( \phi _1 \vert \alpha _1, \beta _1 ) \cdot Gamma ( \phi _2 \vert \alpha _2, \beta _2),\) in the space \((\phi _1, \phi _2)\), are reported in Figure 7 [B].

Example 7

- Comparison of the shape parameter of two Gamma distributions Let us consider two iid Gamma populations \(X_i \sim Gamma \big ( \alpha _i, \beta _i \big ),\) \((\alpha _i, \beta _i) \in {\mathbb {R}}^+ \times {\mathbb {R}}^+\), \(i=1,2\), and let us consider two samples of sizes \(n_1=9\) and \(n_2=12\), respectively, with sample means \(\bar{x}_1 = 2.811\) and \(\bar{x}_2 = 1.973\), and geometric means \(m_{g_1} = 2.116\) and \(m_{g_2} = 1.327\).

We are interested in testing \(H: \alpha _1 = \alpha _2.\) The posterior distribution for \((\alpha _1, \beta _1, \alpha _2, \beta _2)\) is given by

Fig. 8
figure 8

Comparison of the shape parameters of two independent Gamma populations, using data of Example 7. The sets \(\Theta _a, \Theta _b\) and \(\Theta _H\) of the partition are reported

$$\begin{aligned} g_1( \alpha _1, \beta _1, \alpha _2, \beta _2 \vert {\varvec{x}_1}, {\varvec{x}_2})= g_1^1 (\alpha _1, \beta _1 \vert {\varvec{x}_1}) \cdot g_1^2 (\alpha _2, \beta _2 \vert {\varvec{x}_2} ) \,, \end{aligned}$$

where

$$\begin{aligned} g_1^i(\alpha _i, \beta _i \vert {\varvec{x}_i} ) = k_i \cdot g_0 ( \alpha _i, \beta _i ) \cdot L(\alpha _i, \beta _i \mid {\varvec{x}_i}), \end{aligned}$$

with normalizing constant \(k_i\), \(i=1,2\). Let \(\varvec{\Theta }_a = \big \{(\alpha _1,\alpha _2) \in \mathbb {R}^+ \times \mathbb {R}^+: \alpha _1 > \alpha _2\big \}\) and \(\varvec{\Theta }_b = \big \{(\alpha _1,\alpha _2) \in \mathbb {R}^+ \times \mathbb {R}^+: \alpha _1 < \alpha _2\big \}\) (see Figure 8). In order to test the hypothesis H, we compute the probability

$$\begin{aligned}&{\mathbb {P}} ( (\alpha _1,\alpha _2) \in \varvec{\Theta }_b \ \vert \ {\varvec{x}_1}, {\varvec{x}_2} )\\&=\displaystyle \int _{\alpha _1< \alpha _2} \int _{{\mathbb {R}}^+ \times {\mathbb {R}}^+} g_1^1 (\alpha _1 , \beta _1 \vert {\varvec{x}_1} ) \cdot g_1^2(\alpha _2 ,\beta _2 \vert {\varvec{x}_2} ) \, \textrm{d} \beta _1 \textrm{d} \beta _2 \, \textrm{d}\alpha _1 \, \textrm{d}\alpha _2 \\&= \displaystyle \int _{\alpha _1 < \alpha _2} \prod _{i=1}^2 k_i \cdot g_0^\alpha (\alpha _i) \cdot {\frac{\Gamma (n_i \alpha _i)}{\Gamma (\alpha _i)^{n_i}}} \cdot \left( \frac{{m_{g_i}}}{n_j \bar{x}_i} \right) ^{n_i \alpha _i} \textrm{d}\alpha _1 \textrm{d}\alpha _2= 0.311 \end{aligned}$$

and, since \(\delta _H = 0.378\), there is evidence in favour of H.

4 Comparison with the FBST

In this section we present a comparison of the BDT with the Full Bayesian Significance Test (FBST) as presented in Pereira and Stern (2020), which provides an overview of the e-value.

In order to facilitate the discussion, let us briefly review the definition of the e-value and the related testing procedure. The FBST can be used with any standard parametric statistical model, where \(\varvec{\theta } \in \Theta \ \subseteq \ \mathbb {R}^p\). It tests a sharp hypothesis H which identifies the null set \(\Theta _{H}\). The conceptual approach of the FBST consists of determining the e-value that represents the Bayesian evidence against H. To construct this measure, the authors introduce the posterior surprise function and its supremum, given respectively by

$$\begin{aligned} s(\varvec{\theta }) = \frac{g_1(\varvec{\theta } \vert \varvec{x})}{r(\varvec{\theta })} \quad \text { and }\quad s^*=s(\varvec{\theta }^*) = \sup _{\varvec{\theta } \in \Theta _{H}} s(\varvec{\theta }), \end{aligned}$$

where \(r(\varvec{\theta })\) is a suitable reference function to be chosen. Then, a tangential set is defined as

$$\begin{aligned} \overline{T}(s^*) = \{\varvec{\theta } \in \Theta \vert s(\varvec{\theta }) > s^*\}, \end{aligned}$$

to the sharp hypothesis H, also called a Highest Relative Surprise Set (HRSS), which includes all parameter values \(\varvec{\theta }\) that attain a larger surprise function value than the supremum \(s^*\) of the null set. Finally, the e-value, that represents the Bayesian evidence against H, is defined as

$$\begin{aligned} \overline{ev}(H) = \overline{W}(s^*) = \int _{\overline{T}(s^*)} g_1(\varvec{\theta } \vert \varvec{x}) \ d\varvec{\theta }. \end{aligned}$$

On the contrary, the e-value in support of H is \(ev(H) = 1 - \overline{ev}(H_0)\), which is evaluated by means of the set \(T(s^*)=\Theta \setminus \overline{T}(s^*)\) and the cumulative surprise function \(W(s^*) = 1 - \overline{W}(s^*)\). In conclusion, the FBST is the procedure that rejects H whenever \(\overline{ev}(H)\) is large.

As pointed out in Pereira and Stern (2020) (Section 3.2) “the role of the reference density is to make\(\overline{ev}(H)\) explicitly invariant under suitable transformations of the coordinate system”. A first non-invariant definition of this measure, which corresponds to the use of a flat reference function \(r(\theta )\propto 1\) in the second formulation, has been given in Pereira and Stern (1999). The first version involved the determination of the tangential set \(\overline{T}\) starting only from the posterior distribution, whereas in the second, a corrective element has been introduced by also including the reference function. Some of the suggested choices for the reference function are the use of uninformative priors such as “the uniform, maximum entropy densities, or Jeffreys’ invariant prior” (see Pereira and Stern (2020), Section 3.2).

4.1 Similarities and differences between the procedures

The most striking similarity between the FBST and the BDT is that both tests, fully accepting the likelihood principle and relying on the posterior distribution of the parameter \(\varvec{\theta } \in \varvec{\Theta }\), are clearly Bayesian.

Another important similarity is that, asymptotically, both tests lead to the rejection of the hypothesis H when it is false (i.e. when we test \(\theta _H \ne \theta ^*\) where \(\theta ^*\) is the true value of the parameter). On the contrary, if \(\theta ^*=\theta _H\) they have a different asymptotic behaviour (see Proposition 1 for the BDM and Section 3.4 in Pereira and Stern (2020) for the e-value).

Certainly, the FBST has a more general reach than the BDT. Indeed, it examines the entire class of sharp hypotheses, whereas the extension of the BDT to such hypotheses is not straightforward and, currently, is limited to considering the subclass of the hypotheses expressed as \(H:\varphi =\varphi _H\) that are able to partition the parameter space \(\varvec{\Theta }\) as \(\big \{ \varvec{\Theta }_a, \, \varvec{\Theta }_H, \, \varvec{\Theta }_b \big \}\). Moreover, notice that while the integration sets \(\varvec{\Theta }_a\) and \(\varvec{\Theta }_b\) are determined exclusively by the hypothesis, the tangential set \(\overline{T}\) depends on the hypothesis, the posterior density and the choice of the reference function. It is questionable, on the other hand, whether the e-value is as easily computable as the BDM is in cases where the parameter space has dimension higher than 1.

Unlike the BDM, the elimination of nuisance parameters is not recommended when using the e-value. In fact, this measure is not invariant with respect to marginalisations of the nuisance parameter and the use of marginal densities to construct credible sets may produce inconsistency.

It is easy to see that one can create an analogy between the p-value, the e-value and \(\delta _H\). Regarding frequentist p-values, the sample space is ordered according to increasing inconsistency with the assumed null hypothesis H. The FBST instead orders the parameter space according to increasing inconsistency with the assumed null hypothesis H, based on the concept of statistical surprise. In the same way, it can be seen that the probability in (7) has to do with the posterior probability of exceeding \(\theta _H\) in a direction in contrast with the data (namely, the side where there is more posterior probability).

Another similarity occurs when considering the reference density \(r(\theta )\) as the (possibly improper) uniform density, since the first and second definitions of evidence define the same tangent set, i.e. the HRSS and the HPDS coincide. Then, for a scalar parameter \(\theta\), since the BDM is linked to the equi-tailed credible regions while the e-value is linked to the HPDS, we have that if:

  • \(g_1(\theta \vert \varvec{x})\) is symmetric and unimodal, then \(\overline{ev}(H) = \delta _H\);

  • \(g_1(\theta \vert \varvec{x})\) is asymmetric and unimodal (for instance with positive skewness) and \(m_1 < \theta _H\) [\(\theta _H < m_1\)], then \(\overline{ev}(H) > \delta _H\) [\(\overline{ev}(H) < \delta _H\)]. When \(m_1=\theta _H\) we have \(0=\delta _H < \overline{ev}(H)\).

4.1.1 Simulation study

In order to determine the resulting false-positive rates of both the FBST and the BDT, we conduct a simulation study for specific sample sizes, considering a continuous (Exponential) and a discrete (Poisson) model and for each one two different choices for the prior distribution, the Jeffreys’ and the conjugate priors. The last ones have been chosen to have mean “far" from the true hypothesized values for the parameters. Regarding the FBST we have considered two different choices for the reference function \(r(\theta )\), the flat and the prior density.

Let \(\varvec{x}=(x_1, \dots , x_n)\) be an iid sample of size n from the Exponential distribution \(X \sim Exp\big (x \vert 1/\theta ^* \big )\), with \(\theta ^*=1.2\). We are interested in testing the hypothesis \(H: \theta _H=\theta ^*=1.2\). Assuming the Jeffreys’ prior \(g_0(\theta ) \propto \theta ^{-1}\), the posterior distribution is \(InvGamma(\theta \vert n, \sum x_i)\) (see Example 1), while adopting a \(InvGamma(\theta \vert \alpha _0, \beta _0)\) prior, with \(\alpha _0=3\) and \(\beta _0=6\), we have a posterior that is still \(InvGamma(\theta \vert \alpha _1, \beta _1)\), with parameters \(\alpha _1=\alpha _0+n\) and \(\beta _1=\beta _0+\sum x_i\). Let now \(\varvec{y}=(y_1, \dots , y_n)\) be an iid sample of size n from a Poisson distribution \(Y \sim Poi\big (y \vert \lambda ^* \big )\), with \(\lambda ^*=3\). Interest is on the hypothesis \(H: \lambda _H=\lambda ^*=3\). For both choices of the prior, the Jeffreys’ \(g_0(\lambda ) \, \propto \; \lambda ^{-\frac{1}{2}}\) and the conjugate \(Gamma(\lambda \vert \alpha _0, \beta _0)\), we have a Gamma posterior \(Gamma(\lambda \vert \alpha _1, \beta _1)\), with parameters respectively equal to \(\alpha _1=\sum y_i+ \frac{1}{2}\), \(\beta _1=n\) and \(\alpha _1 = \alpha _0 + \sum y_i\), \(\beta _1 = \beta _0 + n\).

Table 1 shows the simulation results for three different values of the threshold \(\omega =\{0.90,0.95,0.99\}\), for \(S=50000\) simulations and \(D=50000\) posterior draws for the Exponential model. Concerning the Exponential model with the Jeffreys’ prior across the different sample sizes considered, the false-positive rates are very similar for both tests (two different version of the FBST and the BDM) and, as we expect since we are using objective priors (see Bayarri and Berger (2004)), they are close to the error of the first type \(\alpha =\{0.10, 0.05, 0.01\}\), related to \(\omega\). With the conjugate prior the BDM seems to perform better w.r.t. the two versions of the FBST. Concerning the Poisson model, we have good results for large sample sizes, but also for smaller n expecially with the conjugate prior (see Table 2).

Table 1 Exponential distribution: false positive rates for different sample sizes n and different thresholds \(\omega\).
Table 2 Poisson distribution: false positive rates for different sample sizes n and different thresholds \(\omega\)

4.1.2 Some examples

In order to compare the BDM and the e-value, let us consider different situations and then examine the results.

Example 8

(Continuation of Example 1) As a first comparative scenario, consider the test performed in Example 1 in which \(\theta _H=2.4\) and additionally the case in which \(\theta _H=0.7\). Since the posterior \(g_1(\theta \vert \varvec{x})\) has a positive skewness and \(m_1 < \theta _H=2.4\) then \(\overline{ev}(H) > \delta _H\), on the contrary, for \(m_1 > \theta _H=0.7\) then \(\overline{ev}(H) < \delta _H\). Indeed, we find the results reported in Table 3.

Table 3 Results, for the three different cases examined in Example 1, of \(\delta _H\) and the e-value considering, as a reference distribution, both a flat reference function and a Jeffreys’ prior

The differences between the e-value and \(\delta _H\), which in this example appear to be modest, can actually become meaningful when the posterior has a greater asymmetry and heavier tails. In such case, comparing different hypotheses, the FBST always leads to favour the hypothesis with higher density. Moreover, the e-value may be more or less robust w.r.t. the position of \(\theta _H\), as it is highlighted in the example below.

Example 9

- Test on the mean of the Inverse Gaussian distribution Consider a random variable X with Inverse Gaussian distribution \(X \sim IG(x \vert \mu , \nu _0)\), \(\mu \in {\mathbb {R}}^+\) and \(\nu _0\) known. Given an iid sample \(\varvec{x}\) of size n, the likelihood function for \(\mu\) is \(L(\mu \vert \varvec{x}) \propto \exp \left\{ -n \nu _0 \cdot \left( \frac{\bar{x}}{2\mu ^2}- \frac{1}{\mu }\right) \right\} .\) Adopting the Jeffreys’ prior \(g_0(\mu ) \propto \frac{1}{\sqrt{\mu ^3}}\), we obtain the posterior

$$\begin{aligned}g_1(\mu \vert {\varvec{x}}) \propto \frac{1}{\sqrt{\mu ^3}} \cdot \exp \left\{ -n \nu _0 \cdot \left( \frac{\bar{x}}{2\mu ^2}- \frac{1}{\mu }\right) \right\} . \end{aligned}$$

We are interested in testing the hypothesis \(H: \, \mu = \mu _H\) and we consider a sample of size \(n=8\) for which \(\bar{x} = 4.2\) and \(m_1 = 4.483\). For \(\nu _0=5\), we choose to test \(H_A: \mu = 2.5\) and \(H_B: \mu = 12\). The results of the analysis are displayed in Table 4 and Fig. 9. If we choose \(\omega = 0.95\) as a rejection threshold in both cases, and with both references, we are lead to opposite inferential conclusions.

Table 4 For the two different hypotheses examined in Example 9, the table shows \(\delta _H\) and the e-value considering, as a reference distribution, both a flat reference function and a Jeffreys’ prior
Fig. 9
figure 9

Posterior density function \(g_1(\mu \vert \varvec{x} )\) associated to Example 9. In [A] we have \(\mu _H = 2.5 < m_1\), while in [B] \(\mu _H = 12 > m_1.\)

Example 10

(Continuation of Examples 3, 4, 5) Let us now compare the results obtained with the FBST and the BDT for the Examples 3, 4 and 5, when fixing a value of 0.95 as a rejection threshold.

The conclusions reached with the FBST and with the BDT for Example 3, which can be seen in Table 5, are the same (for both reference functions considered) although, in some cases, there are substantial differences between the values of the evidence measures. To summarise, the hypothesis \(H_B\) has to be rejected while not enough evidence is available for the rejection of the hypotheses \(H_A\) and \(H_C\).

Moving on to Example 4 we can say that the analysis of the findings with the two different tests appears to be more complex than the previous one, see Table 6. In case [A], for both BDT and FBST with the flat reference function, there is not enough evidence to reject the hypothesis. On the contrary, if one considers the FBST with the Jeffreys’ prior as reference function, one is led to reject this hypothesis. In case [B], by rejecting the hypothesis, the BDT is in agreement with the FBST with the Jeffreys’ reference function in contrast to the FBST with the flat reference function for which there is not enough evidence to reject it.

Finally, in the case illustrated in Example 5, the conclusion reached with the FBST and with the BDT is the same (for both reference functions considered), i.e. there is not enough evidence to reject the hypothesis (see Table 7). It should be noted that, again, there are substantial differences between the values of the evidence measures.

Table 5 Results of the Example 3 on the test on the shape parameter, mean and variance of the Gamma distribution
Table 6 Results of the Example 4 on the test of the coefficient of variation for a Normal distribution
Table 7 Results of the Example 5 on the test of the skewness coefficient of the Inverse Gaussian distribution

The calculation of the FBST for a scalar parameter of interest without nuisance parameters, has been carried out through the function defined in the ‘fbst’ package for R (Kelter 2022). Instead, tangential sets \(\overline{T}\) and its integrals, for Examples 3, 4 and 5, were determined by means of the Mathematica software. Browsing through the code that leads to the calculation of these measures (see Manca (2022)), it is evident that more work is required for the calculation of the integration region related to the FBST. In this sense, the BDT appears to be easier to apply.

5 Conclusions

We propose a new measure of evidence in a Bayesian perspective. From an examination of the examples illustrated, the conceptual simplicity of the proposed method is evident as well as its theoretical consistency. We have presented some simple cases where the computation of the BDM is straightforward.

In some situations, the BDM can be usefully applied adopting a subjective prior. It is indeed interesting the situation where one or more statisticians choose the hypothesis H and the prior according to his or their knowledge. In such cases the BDT would have a confirmatory value. The use of subjective priors must be accompanied by a robustness study especially in the case of small sample sizes.

So far we have considered only hypotheses that induce a partition on the parameter space, but the extension of the definition and the analysis of the BDT to more complex hypotheses is under investigation. Theoretical and computational developments in more general contexts are also being explored.