A new Bayesian discrepancy measure

The aim of this article is to make a contribution to the Bayesian procedure of testing precise hypotheses for parametric models. For this purpose, we define the Bayesian Discrepancy Measure that allows one to evaluate the suitability of a given hypothesis with respect to the available information (prior law and data). To summarise this information, the posterior median is employed, allowing a simple assessment of the discrepancy with a fixed hypothesis. The Bayesian Discrepancy Measure assesses the compatibility of a single hypothesis with the observed data, as opposed to the more common comparative approach where a hypothesis is rejected in favour of a competing hypothesis. The proposed measure of evidence has properties of consistency and invariance. After presenting the definition of the measure for a parameter of interest, both in the absence and in the presence of nuisance parameters, we illustrate some examples showing its conceptual and interpretative simplicity. Finally, we compare the BDT with the Full Bayesian Significance Test, a well-known Bayesian testing procedure for sharp hypotheses.


Introduction
D. V. Lindley in [1] (preface page xi) stated that " . . .hypothesis testing looms large in standard statistical practice, yet scarcely appears as such in the Bayesian literature." Since then things have changed and in the last sixty years there have been several attempts to build a measure of evidence that covers, in a Bayesian context, the role that the p-value has played in the frequentist setting.A prominent example is the decision test based on the Bayes Factor and its extensions (see, for instance, [2]).
As an alternative to the Bayes Factor, another Bayesian evidence measure is provided in [3] upon which the testing procedure Full Bayesian Signicance Test (FBST) is based.For a recent survey on the FBST see [4].
The main aim of this paper is to give a contribution to the testing procedure of precise hypotheses.In particular, the proposed Bayesian measure of evidence, called Bayesian Discrepancy Measure (BDM), gives an absolute evaluation of a hypothesis H in light of prior knowledge about the parameter and observed data.The proposed measure of evidence has the desired properties of invariance under reparametrization and consistency for large samples.
Our starting point is the idea that a hypothesis may be more or less supported by the available evidence contained in the posterior distribution.
We do not adopt the hypothesis testing approach for which there is no test that can lead to the rejection of a hypothesis except by comparing it with another hypothesis (Neyman-Pearson in the frequentist perspective, Bayes factor in the Bayesian one), but rather the approach proposed by Fisher (see [5] and [6]).Reference is made to a precise hypothesis H and no alternative is considered against such hypothesis.In this view different hypotheses made by several experts can be evaluated and using the information coming from the same data, some can be accepted others not.In this respect, in a broad sense, we can say that we return to Fisher's original idea of pure significance according to which "Every experiment may be said to exist only in order to give the facts a chance of disproving the null hypothesis" ( [7]).Notice that, since the Bayesian Discrepancy Test does not require any alternative hypothesis to be specified, the Jeffreys-Lindley paradox cannot arise unlike with the Bayes Factor (see [8]).
The structure of the paper is as follows.In Section 2 the definition of the proposed index is presented for a scalar parameter of interest, both in the absence or presence of nuisance parameters.In Section 3 different illustrative examples are discussed, involving one or two independent populations.Finally, in Section 4 we make a comparison between the Bayesian Discrepancy Test and the Full Bayesian Significance Test which is based on the e-value, a well-known Bayesian evidence index used to test sharp hypotheses.The last section contains conclusions and directions for further research.2 The Bayesian Discrepancy Measure Let (X , P X θ , Θ) be a parametric statistical model where X ∈ X ⊂ R k , P X θ = {f (x|θ) | θ ∈ Θ} is a class of probability distributions (Lebesgue integrable) defined on X , depending on an unknown vector of continuous parameters θ ∈ Θ, an open subset of R p .Assume that (a) the model is identifiable; (b) f (x|θ) have support not depending on θ, ∀ θ ∈ Θ; (c) the operations of integration and differentiation with respect to θ can be exchanged.
We assume a prior probability density g 0 (θ) following Cromwell's Rule which states that "it is inadvisable to attach probabilites of zero to uncertain events, for if the prior probability is zero so is the posterior, whatever be the data.A probability of one is equally dangerous because then the probability of the complementary event will be zero" (see Section 6.2 in [9]).
First we discuss the case of a scalar parameter.Then we discuss the case of a scalar parameter of interest in the presence of nuisance parameters.

The Bayesian Discrepancy Measure for a scalar parameter
In this section we assume that k = p = 1.
Given an iid random sample x = (x 1 , . . ., x n ) from P X θ , let L(θ|x) be the corresponding likelihood function based on data x and let g 0 (θ) be the prior distribution on Θ ⊆ R. The posterior probability density for θ given x is then Moreover, given the posterior distribution function G 1 (θ|x), the posterior median is any real number m 1 which satisfies the inequalities In the case in which G 1 is continuous and strictly increasing we have .
We are interested in testing the precise hypothesis In order to measure the discrepancy of the hypothesis (1) w.r.t. the posterior distribution, in the case Θ = R, we consider the following two intervals: 1. the discrepancy interval 2. the external interval When m 1 = θ H , the external interval I E can be (−∞, m 1 ) or (m 1 , +∞).Note that, by construction, P(I H ∪ I E ) = 1 2 (see Figure 1).If the support of the posterior is a subset of R, the intervals I H and I E can be defined consequently.
Definition 1 Given the posterior distribution function G 1 (θ|x), we define the Bayesian Discrepancy Measure of the hypothesis H as The measure can be also computed by means of the external interval as which can also be written as where In the absolutely continuous case, this simplifies to Formulations ( 6) and ( 7) have the advantage of not involving the posterior median in the integral computation.Furthermore, one can interpret the quantity min{G 1 (θ H |x), 1−G 1 (θ H |x)} as the posterior probability of a "tail" event concerning only the precise hypothesis H. Doubling this "tail" probability, related to the precise hypothesis H, one gets a posterior probability assessment about how "central" the hypothesis H is, and hence how it is supported by the prior and the data.
It is important to highlight that the hypothesis H induces the following partition of the parameter space Θ.Then formulations ( 6) and ( 7) can be equivalently expressed as The last formula can be naturally extended to the case where, besides the scalar parameter of interest, nuisance parameters are also present.This issue will be developed in Section 2.2.
As pointed out before, the further θ H is from the posterior median m 1 of the distribution function G 1 (θ|x), the closer δ H is to 1.It can then be said that H does not conform to G 1 (θ|x).On the contrary, the smaller δ H the stronger is the evidence in favor of H. Following this idea, we can define a general testing procedure by choosing a certain threshold to establish how large the measure must be, before we can state that H does not conform to the posterior distribution function.

Definition 2
The Bayesian Discrepancy Test (BDT) is the procedure based on the Bayesian Discrepancy Measure (BDM) that rejects the hypothesis H when δ H is higher than some critical value ω ∈ {0.95, 0.99, 0.995, 0.999, . . .}.
As for all measures of evidence (Bayesian or frequentist), the chosen value for ω inevitably has a character of subjectivity.For a more detailed discussion on the threshold choice see the remark in Example 1.

Proposition 1
The following properties apply to the BDM, for a scalar parameter θ: (i) δ H always exists and, by construction, δ H ∈ [0, 1); (ii) δ H is invariant under invertible monotonic transformations of the parameter θ; (iii) if θ * is the true value of the parameter and Proof 1 (i) The first property follows immediately from the fact that in (4) the posterior probability P(θ ∈ I H |x) ∈ 0, 1 2 .
(ii) Let λ = λ(θ) be an invertible monotonic transformation of the parameter θ and let K 1 be the cumulative distribution function of the parameter λ.We denote with λ H = λ(θ H ) and we notice that m 1 = λ(m 1 ) thanks to the monotonic invariance of the median.Suppose, for simplicity, that θ H > m 1 .Then Therefore, the invariance of the BDM follows immediately from the invariance of the median under invertible monotonic transformations.Notice that if instead of the median m 1 we consider, for example, the posterior mean E(θ|x), which is not invariant under invertible monotonic reparametrizations, the property will not hold in general.Moreover, E(θ|x) for some models may not even exist.
(iii) Let us examine the first part of the statement for which θ H = θ * .Suppose that θ H < m 1 .The BDM is defined as Using the integral transform and the fact that we have supposed θ H < m 1 , we easly find that Then, since δ H = 1 − 2W , we find δ H ∼ U nif (•|0, 1).A similar proof holds for θ H > m 1 .If, instead, θ H = θ * and n → ∞, under suitable regularity conditions (see for instance Section 5.3.2,p. 287 in [10]) it is well known that g 1 (θ|x) is concentrated around θ * .In particular, the posterior median m 1 converges in probability to θ * .Again, suppose for instance that θ H < θ * , then

The Bayesian Discrepancy Measure in presence of nuisance parameters
Suppose that p ≥ 2 and k ≥ 1.Let ϕ = ϕ(θ) be a scalar parameter of interest, where ϕ : Θ → Φ ⊆ R. Let us further consider a bijective reparametrization θ ⇔ (ϕ, ζ), where ζ ∈ Z ⊆ R p−1 denotes an arbitrary nuisance parameter, which is determined on the basis of analytical convenience (note that the value of the evidence measure is invariant with respect to the choice of the nuisance parameter).We consider hypotheses that can be expressed in the form where ϕ H is known as it represents the hypothesis that it is of interest to evaluate.The transformation ϕ must be such that, for all θ ∈ Θ and for all ϕ H ∈ Φ, it can always be assessed whether ϕ is strictly smaller, strictly larger or equal to ϕ H (i.e.ϕ < ϕ H either ϕ > ϕ H , or ϕ = ϕ H ). Hypothesis (11) and transformation ϕ univocally identify the partition Θ a , Θ H , Θ b of the parameter space Θ, with We call any hypothesis of type (11), which identify a partition of the form (12), a partitioning hypothesis.It is easy to verify that many commonly used hypotheses are partitioning.In this paper we only consider hypotheses of this nature.In this setting, we express the BDM as where the external set is given by In the particular scenario where the marginal posterior of the parameter of interest ϕ can be computed in a closed form, the hypothesis (11) can be easily treated using the methodologies seen in Subsection 2.1, i.e. the BDM is computed by means of formula (4) or ( 5) applied to the marginal.
Properties reported in Proposition 1 naturally extend to the setting we just presented.

Illustrative examples
The simplicity of the BDT is highlighted by the following examples, some of which deal with cases not usually considered in the literature.Examples 1 and 2 focus on a scalar parameter of interest, while Examples 3, 4, 5, 6, 7 also contain nuisance parameters.
In all examples we have adopted a Jeffrey's prior (see [11] for a catalog of non-informative priors) for simplicity.However, other objective priors and, in the presence of substantive prior information, informative priors could equally be used.
Figure 2 shows the posterior density function as well as the discrepancy and the external intervals for Note that in all scenarios considered, we find the following relation between δ H and the p-value, (in [A] δ H = 0.832 and p-value= 0.168, in [B] δ H = 0.96 and p-value= 0.04, while in [C] δ H = 0.997 and p-value= 0.003).This result depends clearly on the use of the Jeffreys' prior, which is a matching prior for a scalar parameter (see [12]).
Remark 1 The fact that classical and Bayesian procedures, under certain conditions, produce the same conclusions is well known (see, for instance, [1]).The linear relationship (15) also occurs in other simple cases.Even if it does not hold for more complicated models and in general for proper priors, it suggests a relationship between the traditional p-value levels of significance {0.05, 0.01, 0.005, . . .}, and the critical values for the discrepancy measure {0.95, 0.99, 0.995, . . .}.In this paper we will not investigate the problem of the choice of the BDM threshold ω.Several aspects about the choice of pvalues thresholds have been considered in [13] and can be suitably extended to the BDM.
(ii) δ H is monotonically increasing, both with respect to n, and with respect to the distance |m 1 − θ H |; (iii) the posterior g 1 always has a positive asymmetry, which decreases as n increases; (iv) the trend difference of the BDM in cases [A] and [B] depends on the fact that the posterior g 1 has 'small' tails on the left-hand side of m 1 and 'large' tails on the right-hand side.
Moving forward in the discussion, in order to highlight the evaluative nature of the BDT, it is worth pointing out that it allows the separate and simultaneous testing of ≥ 2 hypotheses as shown in Example 2. Remember that with the comparative approach, among the competing hypotheses, only one is accepted.On the contrary, under the evaluative approach, it may happen that several hypotheses are supported by the data, or even that all hypotheses must be rejected.Example 2 -Evaluation of some hypotheses made by several experts (Bernoulli distribution) In the 1700s, several hypotheses H j : θ = θ j were formulated about the birth masculinity rate θ = M M +F .Among them we consider θ 1 = 1 2 (J.Bernoulli), θ 2 = 13 25 (J.Arbuthnot), θ 3 = 1050 2050 (J.P. Süssmilch), θ 4 = 23 45 (P. S. Laplace).We assume that the gender of each newborn is modeled as a Bin(•|1, θ).Then, using data recorded in 1710 in London (see, for instance, [14]), with 7640 males and 7288 females (the MLE is θ = 0.512) and assuming the Jeffreys' prior Beta(θ|1, 1), we compute δ H j using the Normal asymptotic approximation with g1 the Normal distribution.Since δ H 1 = 0.996, δ H 2 = 0.955, δ H 3 = 0.079, δ H 4 = 0.132, we can conclude that the first two hypotheses has to be rejected, while there is not enough evidence to reject the hypotheses made by Süssmilch and Laplace.

Examples of the more general case
The examples presented hereafter, can be distinguished by tests concerning a parameter or a parametric function of a single population, and tests concerning the comparison of two independent population parameters.

Tests involving a single population
Example 3 -Test on the shape parameter, mean and variance of the Gamma distribution Let x = (x 1 , . . ., x n ) be an iid sample of size n from X ∼ Gamma x|α, β , (α, β) ∈ R + ×R + .We denote by m g the geometric mean of x.The likelihood function for (α, β) is given by For the fictitious data x = (0. and σ 2 = α β 2 denote the mean and the variance of X. Adopting the Jeffreys' prior for (α, β), i.e.

• Case [B]
The hypothesis H B identifies the straight line of equation β = 1 µ H α in the αβ-plane (see Figure 4 [B ]) and the two subsets We have and, since δ H = 0.976, we reject H B .

• Case [C]
The hypothesis H C identifies the parabola of equation αβ-plane (see Figure 4 [C ]), and the two subsets We have Therefore δ H = 0.846, and so we do not reject H C .
Example 4 -Test on the coefficient of variation for a Normal distribution Given an iid sample x = (x 1 , . . ., x n ) from X ∼ N x|µ, φ −1 , the parameter . We are interested in testing the hypothesis with ψ H = 0.1.If we consider the Jeffreys' prior g 0 (µ, φ) ∝ φ −1 • 1 R×R + , the posterior distribution is the Normal-Gamma density with hyperparameters (η, ν, α, β), where η = x, ν = n, α = 1 2 (n − 1), β = 1 2 ns 2 , and density We consider the particular case in which x = 17 and s 2 = 1.6 (so that the MLE is φ = 0.074) with two samples of size n = 10 (Figure 5  Example 5 -Test on the skewness coefficient of the Inverse Gaussian distribution Let us consider a Inverse Gaussian random variable X with density where (µ, ν) ∈ R + ×R + .The parameter of interest is the skewness coefficient γ = 3 µ ν and it is of interest to test the hypothesis H : γ = γ H , where γ H = 2.The Jeffreys' prior is Given n observations, the posterior distribution of (µ, ν) is where x and a are the arithmetic and harmonic mean, respectively.We apply the procedure to the following precipitation data (inches) from Jug Bridge, Maryland, analyzed in [15]   In the plot the sets of the partition induced by H are reported.Data refers to Example 5.

The hypothesis identifies in the parameter space
We have that see Figure 5, then we obtain δ H = 0.844.This result indicates that we do not have enough evidence to reject the hypothesis H.

Tests involving two independent populations
In this section we consider some examples concerning comparisons between parameters of two independent populations.

Example 6 -Comparison between means and precisions of two independent Normal populations
Let us consider a case study on the dating of the core and periphery of some wooden furniture, found in a Byzantine church, using radiocarbon (see [16], p. 409).The historians wanted to verify if the mean age of the core is the same as the mean age of the periphery, using two samples of sizes m = 14 and n = 9, respectively, given by We assume that the age of the core X and of the periphery Y are distributed as , and we assume that the data are iid conditional on the parameters.We consider for (µ i , φ i ) the Jeffreys' prior We obtain x = 1249.86,ȳ = 1261.33,d = x− ȳ = −11.48,while the MLEs for the sample standard deviations are s 1 = 23.43 and s 2 = 12.51.The posterior distribution for (µ i , φ i is the Normal-Gamma law , and density The hypothesis of interest identifies the following subsets in the parameter space Then we can compute dµ 1 dµ 2 = 0.089 , so we have δ H = 0.823, a value that do not lead to the rejection of the hypothesis.We exploited the fact that the marginal of each µ i is a Generalized Student's t-distribution (denoted by StudentG) with hyperparameters η i , β i ν i α i , 2α i .Figure 7 [A] in the space (µ 1 , µ 2 ) shows the contour lines of the distribution Note that the homoscedasticity assumption is not necessary.Consider now the hypothesis which determines in the parameter space the subsets We have from which it follows that δ H = 0.908 and we reject the hypothesis H.To compute the integral we have used the fact that the marginal of each φ i has Gamma distribution with parameters (α i , β i ), i = 1, 2.
The contour lines of the law Gamma(φ Example 7 -Comparison of the shape parameter of two Gamma distributions Let us consider two iid Gamma populations We are interested in testing H : α 1 = α 2 .The posterior distribution for (α 1 , β 1 , α 2 , β 2 ) is given by 8).In order to test the hypothesis H, we compute the probability and, since δ H = 0.378, we do not reject H.

Comparison with the FBST
In this section we present a comparison of the BDT with the Full Bayesian Significance Test (FBST) as presented in [4], which provides an overview of the e-value.
In order to facilitate the discussion, let us briefly review the definition of the e-value and the related testing procedure.The FBST can be used with any standard parametric statistical model, where θ ∈ Θ ⊆ R p .It tests a sharp hypothesis H which identifies the null set Θ H .The conceptual approach of the FBST consists of determining the e-value that represents the Bayesian evidence against H.To construct this measure, the authors introduce the posterior surprise function and its supremum, given respectively by where r(θ) is a suitable reference function to be chosen.Then, a tangential set is defined as to the sharp hypothesis H, also called a Highest Relative Surprise Set (HRSS), which includes all parameter values θ that attain a larger surprise function value than the supremum s * of the null set.Finally, the e-value, that represents the Bayesian evidence against H, is defined as On the contrary, the e-value in support of H is ev(H) = 1 − ev(H 0 ), which is evaluated by means of the set T (s * ) = Θ \ T (s * ) and the cumulative surprise function W (s * ) = 1 − W (s * ).In conclusion, the FBST is the procedure that rejects H whenever ev(H) is large.As pointed out in [4] (Section 3.2) "the role of the reference density is to make ev(H) explicitly invariant under suitable transformations of the coordinate system".A first non-invariant definition of this measure, which corresponds to the use of a flat reference function r(θ) ∝ 1 in the second formulation, has been given in [3].The first version involved the determination of the tangential set T starting only from the posterior distribution, whereas in the second, a corrective element has been introduced by also including the reference function.Some of the suggested choices for the reference function are the use of uninformative priors such as "the uniform, maximum entropy densities, or Jeffreys' invariant prior " (see [4], Section 3.2).

Similarities and differences between the procedures
The most striking similarity between the FBST and the BDT is that both tests, fully accepting the likelihood principle and relying on the posterior distribution of the parameter θ ∈ Θ, are clearly Bayesian.
Another important similarity is that, asymptotically, both tests lead to the rejection of the hypothesis H when it is false (i.e. when we test θ H = θ * where θ * is the true value of the parameter).On the contrary, if θ H = θ * they have a different asympototic behaviour (see Proposition 1 for the BDM and Section 3.4 in [4] for the e-value).
Certainly, the FBST has a more general reach than the BDT.Indeed, it examines the entire class of sharp hypotheses, whereas the extension of the BDT to such hypotheses is not straightforward and, currently, is limited to considering the subclass of the hypotheses expressed as H : ϕ = ϕ H that are able to partition the parameter space Θ as Θ a , Θ H , Θ b .Moreover, notice that while the integration sets Θ a and Θ b are determined exclusively by the hypothesis, the tangential set T depends on the hypothesis, the posterior density and the choice of the reference function.It is questionable, on the other hand, whether the e-value is as easily computable as the BDM is in cases where the parameter space has dimension higher than 1.
Unlike the BDM, the elimination of nuisance parameters is not recommended when using the e-value.In fact, this measure is not invariant with respect to marginalisations of the nuisance parameter and the use of marginal densities to construct credible sets may produce inconsistency.
It is easy to see that one can create an analogy between the p-value, the e-value and δ H . Regarding frequentist p-values, the sample space is ordered according to increasing inconsistency with the assumed null hypothesis H.The FBST instead orders the parameter space according to increasing inconsistency with the assumed null hypothesis H, based on the concept of statistical surprise.In the same way, it can be seen that the probability in (7) has to do with the posterior probability of exceeding θ H in a direction in contrast with the data (namely, the side where there is more posterior probability).
Another similarity occurs when considering the reference density r(θ) as the (possibly improper) uniform density, since the first and second definitions of evidence define the same tangent set, i.e. the HRSS and the HPDS coincide.Then, for a scalar parameter θ, since the BDM is linked to the equi-tailed credible regions while the e-value is linked to the HPDS, we have that if: • g 1 (θ|x) is symmetric and unimodal, then ev(H) = δ H ; • g 1 (θ|x) is asymmetric and unimodal (for instance with positive skewness) and

Simulation study
In order to determine the resulting false-positive rates of both the FBST and the BDT, we conduct a simulation study for specific sample sizes.
Table 1 shows the simulation results for three different values of the threshold ω = {0.90,0.95, 0.99}, for S = 50000 simulations and D = 50000 posterior draws.Across the different sample sizes considered, the falsepositive rates are very similar for both tests and, as we expect since we are using objective priors (see [17]), they are close to the error of the first type α = {0.10,0.05, 0.01}, related to ω.Similar results, not reported here, were found adopting a Poisson model.

Some Examples
In order to compare the BDM and the e-value, let us consider different situations and then examine the results.

Example 8 (Continuation of Example 1)
As a first comparative scenario, consider the test performed in Example 1 in which θ H = 2.4 and additionally the case in which θ H = 0.7.Since the posterior g 1 (θ|x) has a positive skewness and m 1 < θ H = 2.4 then ev(H) > δ H , on the contrary, for m 1 > θ H = 0.7 then ev(H) < δ H . Indeed, we find the results reported in Table 2.
The differences between the e-value and δ H , which in this example appear to be modest, can actually become meaningful when the posterior has a greater .Adopting the Jeffreys' prior g 0 (µ) ∝ 1 √ µ 3 , we obtain the posterior We are interested in testing the hypothesis H : µ = µ H and we consider a sample of size n = 8 for which x = 4.2 and m 1 = 4.483.For ν 0 = 5, we choose to test H A : µ = 2.5 and H B : µ = 12.The results of the analysis are displayed in Table 3 and Figure 9.If we choose ω = 0.95 as a rejection threshold in both cases, and with both references, we are lead to opposite inferential conclusions.The conclusions reached with the FBST and with the BDT for Example 3, which can be seen in Table 4, are the same (for both reference functions considered) although, in some cases, there are substantial differences between the values of the evidence measures.To summarise, the hypothesis H B has to be rejected while not enough evidence is available for the rejection of the hypotheses H A and H C .
Moving on to Example 4 we can say that the analysis of the findings with the two different tests appears to be more complex than the previous one, see Table 5.In case [A], for both BDT and FBST with the flat reference function, there is not enough evidence to reject the hypothesis.On the contrary, if one considers the FBST with the Jeffreys' prior as reference function, one is led to reject this hypothesis.In case [B], by rejecting the hypothesis, the BDT is in agreement with the FBST with the Jeffreys' reference function in contrast to the FBST with the flat reference function for which there is not enough evidence to reject it.
Finally, in the case illustrated in Example 5, the conclusion reached with the FBST and with the BDT is the same (for both reference functions considered), i.e. there is no enough evidence to reject the hypothesis (see Table 6).It should be noted that, again, there are substantial differences between the values of the evidence measures.The calculation of the FBST for a scalar parameter of interest without nuisance parameters, has been carried out through the function defined in the 'fbst' package [18] for .Instead, tangential sets T and its integrals, for Examples 3, 4 and 5, were determined by means of the Mathematica software.Browsing through the code that leads to the calculation of these measures

Figure 2 :
Figure 2: Posterior density function g 1 (θ|nx) and intervals I H = (m 1 , θ H ) and 4 and the MLE x = 1.2 for three sample sizes [A] n = 6, [B] n = 12, [C] n = 24.In [A] we have a posterior median m 1 = 1.27 and δ H = 0.832, while in [B] m 1 = 1.23 and δ H = 0.960, in [C] m 1 = 1.22 and δ H = 0.997.In case [A] we do not reject H while in [B] and in [C] we are led to reject the hypothesis.

Figure 4 :
Figure 4: Posterior density function g 1 (α, β|x) from Example 3 and corresponding sets of the induced partition in the cases [A], [B] and [C].

Figure 5 :
Figure 5: Test on the coefficient of variation ψ of a Gaussian population.Data refers to Example 4. In the plots, the sets Θ a , Θ b and Θ H are reported for n = 10 ([A]) and n = 40 ([B]).

Figure 6 :
Figure 6: Test on the skewness of the Inverse Gaussian distribution with γ H = 2.

Figure 7 :
Figure 7: Comparisons between means ([A]) and precisions ([B]) of independent normal populations for data in Example 6.For both cases we show the contour plots of the marginals of µ j ([A]) and φ j ([B]), and the partition sets associated with the corresponding hypotheses.

dα 1 dα 2 = 1 Α 2 Figure 8 :
Figure 8: Comparison of the shape parameters of two independent Gamma populations, using data of Example 7. The sets Θ a , Θ b and Θ H of the partition are reported.

Example 10 (
Continuation of Examples 3, 4, 5) Let us now compare the results obtained with the FBST and the BDT for the Examples 3, 4 and 5, when fixing a value of 0.95 as a rejection threshold.e-value δ H r(θ) ∝ 1 r(θ) = g 0 (θ) H A : µ = 2

Table 4 :
e-valueδ H r(θ) ∝ 1 r(θ) = g 0 (θ)H A : α = 2The table shows the results of the Example 3 on the test on the shape parameter, mean and variance of the Gamma distribution.For the e-value we have considered, as a reference distribution, both a flat reference function and a Jeffreys' prior.

Table 1 :
False positive rates for different sample sizes n and different thresholds ω.

Table 2 :
The table shows, for the 3 different cases examined in Example 1, the values of δ H and of the e-value considering, as a reference distribution, both a flat reference function and a Jeffreys' prior.asymmetry and heavy tailes.In such case, comparing different hypotheses, the FBST always leads to favour the hypothesis with higher density.Moreover, the e-value may be more or less robust w.r.t. the position of θ H , as it is highlighted in the example below.
Example 9 -Test on the mean of the Inverse Gaussian distribution Consider a random variable X with Inverse Gaussian distribution X ∼ IG(x|µ, ν 0 ), µ ∈ R + and ν 0 known.Given an iid sample x of size n, the likelihood function for µ is L(µ|x) ∝ exp −nν 0 • x 2µ 2 − 1 µ

Table 3 :
For the two different hypothesis examined in Example 9, the table shows δ H and the e-value considering, as a reference distribution, both a flat reference function and a Jeffreys' prior.