fbst: An R package for the Full Bayesian Significance Test for testing a sharp null hypothesis against its alternative via the e value

Hypothesis testing is a central statistical method in psychology and the cognitive sciences. However, the problems of null hypothesis significance testing (NHST) and p values have been debated widely, but few attractive alternatives exist. This article introduces the fbst R package, which implements the Full Bayesian Significance Test (FBST) to test a sharp null hypothesis against its alternative via the e value. The statistical theory of the FBST has been introduced more than two decades ago and since then the FBST has shown to be a Bayesian alternative to NHST and p values with both theoretical and practical highly appealing properties. The algorithm provided in the fbst package is applicable to any Bayesian model as long as the posterior distribution can be obtained at least numerically. The core function of the package provides the Bayesian evidence against the null hypothesis, the e value. Additionally, p values based on asymptotic arguments can be computed and rich visualizations for communication and interpretation of the results can be produced. Three examples of frequently used statistical procedures in the cognitive sciences are given in this paper, which demonstrate how to apply the FBST in practice using the fbst package. Based on the success of the FBST in statistical science, the fbst package should be of interest to a broad range of researchers and hopefully will encourage researchers to consider the FBST as a possible alternative when conducting hypothesis tests of a sharp null hypothesis.


Introduction
Hypothesis testing is a widely used method in the cognitive sciences and in experimental psychology.However, the recently experienced replication crisis troubles experimental sciences and the underlying problems are still widely debated (Wagenmakers and Pashler, 2012;Pashler and Harris, 2012;Wasserstein et al., 2019;Haaf et al., 2019).Among the identified problems is the inappropriate use and interpretation of p-values, which are used in combination with null hypothesis significance tests (NHST) (Benjamin and Berger, 2019;Benjamin et al., 2018;Colquhoun, 2014Colquhoun, , 2017)).As a consequence, in 2016 the American Statistical Association issued a statement about the identified problems and recommended to consider alternatives to p-values or supplement data analysis with further measures of evidence: "All these measures and approaches rely on further assumptions, but they may more directly address the size of an effect (and its associated uncertainty) or whether the hypothesis is correct."(Wasserstein and Lazar, 2016, p. 132) arXiv:2006.03332v1[stat.ME] 5 Jun 2020 Due to the problems with NHST and p-values, the editors of Basic and Applied Social Psychology even decided to ban p-values and NHST completely from their journal.
In the recent literature various proposals have been made how to improve the reproducibility of research and the quality of statistical data analysis, in particular the reliability of statistical hypothesis tests.These proposals range from stricter thresholds for stating statistical significance (Benjamin et al., 2018) to more profound methodological changes (Kruschke and Liddell, 2018a;Wagenmakers et al., 2016;Morey et al., 2016b).In the last category, an often stated solution is a shift towards Bayesian data analysis (Wagenmakers et al., 2016;Kruschke and Liddell, 2018a;Kruschke et al., 2012;Ly et al., 2016a,b).The advantages of such a shift include the adherence of Bayesian methods to the likelihood principle (Birnbaum, 1962), which has important implications.Some of them are the simplified interpretation and appealing properties of Bayesian interval estimates for quantifying the uncertainty in parameter estimates (Morey et al., 2016a).Others are given by the independence of results of the researcher's intentions (Kruschke and Liddell, 2018b;Berger and Wolpert, 1988;Edwards et al., 1963) as well as the ability to make use of optional stopping (Rouder, 2014).The last property is, in particular, appealing in practical research, as it allows to stop recruiting participants and report the results based on the collected data in case they already show overwhelming evidence.Notice that this is not permitted when making use of NHST and p-values, which can lead to financial and ethical problems, in particular in the biomedical and psychological sciences.
Considering Bayesian alternatives to NHST and p-values, the most prominent approach to Bayesian hypothesis testing is the Bayes factor which was invented by Jeffreys (1931), see also Etz and Wagenmakers (2015).The Bayes factor is often advocated as a Bayesian alternative to the frequentist p-value when it comes to hypothesis testing, in particular in the cognitive sciences and psychology (Van De Schoot et al., 2017;Wagenmakers et al., 2016Wagenmakers et al., , 2010;;Ly et al., 2016b;van Doorn et al., 2019;van Dongen et al., 2019).However, there are also other approaches like Bayesian equivalence testing based on the region of practical equivalence (ROPE) (Kruschke, 2013(Kruschke, , 2015;;Kruschke and Liddell, 2018b;Kruschke, 2018;Westlake, 1976;Kirkwood and Westlake, 1981;Kelter, 2020a;Liao et al., 2020) which are based on an analogy to frequentist equivalence tests (Lakens, 2017;Lakens et al., 2018).Also, there exist various other measures and alternatives to test hypotheses in the Bayesian approach, including the MAP-based pvalue (Mills, 2017), the probability of direction (PD) (Makowski et al., 2019b,a) and the Full Bayesian Significance Test (FBST) (Pereira and Stern, 1999;Stern, 2003;Madruga et al., 2001Madruga et al., , 2003;;Pereira et al., 2008;Pereira and Stern, 2020;Esteves et al., 2019).In contemporary literature, there is still a debate about which Bayesian measure to use in which setting for scientific hypothesis testing, and while some authors argue in favour of the Bayes factor (Wagenmakers et al., 2016;Etz and Vandekerckhove, 2016;Kelter, 2020b), there is also criticism about the focus on the Bayes factor in the cognitive sciences (Tendeiro and Kiers, 2019;Greenland, 2019).By now, comparisons of different Bayesian posterior indices are rare, but the existing results show that it is useful to consider various different Bayesian approaches to hypothesis testing depending on the research goal and study design, see Kelter (2020a); Makowski et al. (2019b); Liao et al. (2020).
In this paper, attention is directed to one specific Bayesian alternative to NHST and p-values, the Full Bayesian Significance Test (FBST) and the evalue, and the R package fbst is introduced.The FBST has been developed over two decades ago in the statistical literature (Pereira and Stern, 1999), and since has been employed successfully in a broad range of scientific areas and applications.It is not possible to cover all theoretical and practical work which has been pursued concerning the FBST in the last two decades, and for a concise review, we refer the reader to Pereira and Stern (2020).The R package fbst introduced in this paper offers an intuitive and widely applicable software implementation of the FBST and the e-value.The package has been designed to work in combination with widely used R packages for fitting Bayesian models in the cognitive sciences and psychology and offers appealing visualisations to communicate and share the results of an analysis with colleagues.
The structure of this paper is as follows: First, we describe the underlying theory of the FBST and the e-value.Second, we give information about the available functionality and software implementation of the package.Subsequently, we demonstrate with two examples of widely used statistical models in psychological research how the FBST can be applied in practice via the fbst package.Finally, we conclude by drawing attention to the benefits and limitations of the package and give some ideas about fu-ture extensions.In summary, the FBST and e-value could be an appealing Bayesian alternative to NHST and p-values which has been widely under-utilised by now in the cognitive sciences and psychology.This clearly can be attributed to the dearth of accessible software implementations, one of which is presented in form of the R package introduced in this paper.The fbst package hopefully will foster critical discussion and reflection about different approaches to Bayesian hypothesis testing and allow to pursue further research to investigate the relationship between different posterior indices for significance and effect size (Kelter, 2020a;Makowski et al., 2019b;Liao et al., 2020).

The FBST and the e-value
This section describes the statistical theory behind the FBST and the e-value in more detail.The philosophical basis (or conceptual approach) is first described briefly, and subsequently, the necessary notation is introduced.

Conceptual approach of the FBST
The Full Bayesian Significance Test was first introduced by Pereira and Stern (1999) more than two decades ago as a Bayesian alternative to traditional frequentist null hypothesis significance tests.It was invented to test a sharp (or precise) point null hypothesis H 0 against its alternative H 1 .
Traditional frequentist approaches measure the inconsistency of the observed data with a null hypothesis H 0 (Kempthorne, 1976;Cox et al., 1977).Frequentist hypothesis tests employ p-values to order the sample space according to increasing inconsistency with the hypothesis.Notice, that a p-value is defined as the probability of obtaining a result (which, of course, is located in the sample space) equal to or more extreme than the one observed under the assumption of the null hypothesis H 0 (Held and Sabanés Bové, 2014).In contrast, the e-value produced in the FBST aims at ordering the parameter space according to increasing inconsistency with the observed data (Pereira et al., 2008).In formulas, traditional frequentist significance tests use the p-value to reject the null hypothesis H 0 : Here, C often is the set of sample space values x ∈ X (where X is the sample space) for which a test statistic T θ0 (derived under the assumption of the null hypothesis value θ 0 ) is at least as large as the test statistic value t calculated from the observed data.The set C can be interpreted as the sample space values x ∈ X which are at least as inconsistent with the null hypothesis H 0 as the observed data.The pvalue now quantifies the evidence against H 0 by calculating the probability of sample space values x being located precisely in this set (Casella and Berger, 2002).
The idea put forward in Pereira and Stern (1999) and Pereira et al. (2008) is simple: Instead of considering the sample space, a Bayesian should inspect the tangential set T of parameter values (which are, of course, located in the parameter space).This set consists of all parameter values which are more consistent with the observed data x than θ 0 , which is the Bayesian evidence ev.Here, ev is defined as and ev = 1−ev.ev can be interpreted as the evidence in favour of the null hypothesis H 0 , while ev is interpreted as the evidence against H 0 .This latter value is the probability of all parameter values θ which are more consistent with the data x than the null value θ 0 .The conceptual approach of the FBST consists, as a consequence, of constructing a duality Bayesian theory and frequentist sampling theory.This duality is constructed between frequentist significance measures which are based on ordering the sample space according to increasing inconsistency with the data, and the Bayesian e-value, which is based on ordering the parameter space according to increasing inconsistency with the observed data.This conceptual basis ensures that the FBST allows a seamless transition to Bayesian data analysis for researchers who are acquainted with NHST and p-values.The FBST produces the e-value which can be interpreted similarly to the frequentist p-value and little methodological changes are required.However, the consequences of the conceptual basis of the FBST are substantial: As the quantity ev is a fully Bayesian quantity, it allows statements in terms of probability to quantify the evidence.Traditional frequentist measures like p-values do not make probabilistic statements about the parameter (because they are computed over the sample space instead of the parameter space), which is questionable as the goal of the study or experiment is to quantify the uncertainty about a given research hypothesis, which naturally should be done via probability measures (Howie, 2002;Berger and Wolpert, 1988).As a consequence, the FBST and the e-value follow the likelihood principle (Birnbaum, 1962;Basu, 1975;Berger and Wolpert, 1988), which brings several advantages with it: -Researchers can use optional stopping.This means that they are allowed to stop recruiting participants or even abort an experiment and readily report the results when only a fraction of the data already shows overwhelming evidence for or against the hypothesis under consideration (Edwards et al., 1963;Rouder, 2014).-Censored data (which are often observed in longitudinal studies or clinical trials in the cognitive sciences and psychology) can be interpreted easily (Berger and Wolpert, 1988).The likelihood contribution of a single observation in a study where no censoring was possible is equal to the likelihood contribution of a single observation in a study where censoring is possible but did not occur (for the single observation considered).This simplifies the analysis and interpretation of statistical models which include censoring mechanisms, see (Berger and Wolpert, 1988, Chapter 4).-As highlighted by Edwards et al. (1963), Wagenmakers et al. (2016), and Kruschke (2018), the result of a hypothesis test (in this case, the FBST), is not influenced by the researchers' intentions.This last property is substantial for improving the reliability of research in the cognitive sciences and psychology, see McElreath and Smaldino (2015).

Statistical theory of the FBST
In this section we introduce the necessary mathematical notation for a rigid understanding of the FBST.
The FBST can be used with any standard parametric statistical model, where θ ∈ Θ ⊆ R p is a (vectorvalued) parameter of interest, p(x|θ) is the model likelihood and p(θ) is the prior distribution for the parameter θ of interest.A sharp (or expressed equivalently, precise) hypothesis H 0 makes a statement about the parameter θ: Specifically, the null hypothesis H 0 states that θ lies in the so-called null set Θ H0 .
For simple point null hypotheses like H 0 : θ = θ 0 , which are often used in practice, this null set consists of the single parameter value θ 0 so that the null set can be written as Θ H0 = {θ 0 }.As detailed in the previous section, the conceptual approach of the FBST is to state the Bayesian evidence against H 0 , the e-value.This value is the proposed Bayesian replacement of the traditional p-value.To construct the e-value, Pereira et al. (2008) introduced the posterior surprise function s(θ) which is defined as follows: The surprise function s(θ) is the ratio of the posterior distribution p(θ|x) and a suitable reference function r(θ).The first thing to note is that two important special cases are given by a flat reference function r(θ) = 1 or any prior distribution p(θ) for the parameter θ.First, when a flat reference function is selected the surprise function recovers the posterior distribution p(θ|x).Second, when any prior distribution is used as the reference function, one can interpret parameter values θ with a surprise function value s(θ) ≥ 1 as being corroborated by the observed data x.In contrast, parameter values θ with a surprise function s(θ) < 1 indicate that they have not been corroborated by the data.The next step is to calculate the supremum s * of the surprise function s(θ) over the null set Θ H0 .
This supremum is subsequently used in combination with the tangential set, which has been introduced in the last section.Pereira et al. (2008) defined and the tangential set T (ν) to the sharp null hypothesis H 0 is then given as follows: When setting ν = s * , the tangential set T (ν) has its unique interpretation which has been discussed in the previous section: While T (s * ) includes all parameter values θ which are smaller or equal to the supremum value s * of the surprise function s(θ), the tangential set T (s * ) includes all parameter values θ which attain a larger surprise function value than the supremum s * of the null set.The final step to obtain the e-value, the Bayesian evidence against H 0 , is to make use of the cumulative surprise function W (ν) The cumulative surprise function W (ν) is simply an integral of the posterior distribution p(θ|x) over all parameter values with surprise function values s(θ) ≤ ν.Setting ν = s * , the cumulative surprise function W (s * ) becomes the integral of the posterior p(θ|x) over T (s * ).This is the integral of the posterior p(θ|x) over all parameter values which have a surprise function value s(θ) ≤ s * .The e-value is then given as Here W (ν) := 1 − W (ν). Figure 1a visualises the FBST and the e-value ev(H 0 ).The solid line shows the posterior distribution p(δ|x) of the effect size δ after observing the data x, and is produced by a Bayesian two-sample t-test (Kelter, 2020d).A flat reference function r(δ) = 1 was selected in figure 1a.
The supremum over the null set Θ H0 = {0} is s * = s(0), shown as the blue point.The horizontal blue dashed line visualises the boundary between T (0) and T (0), and values with posterior density p(δ) > p(0) are located in T (0), while values with p(δ) ≤ p(0) are located in T (0).The blue shaded area is the cumulative surprise function W (0), which is the integral over the tangential set T (0) against H 0 : δ = 0.This is the e-value ev(H 0 ) against H 0 , the Bayesian evidence against the sharp null hypothesis.
The red shaded area is the integral W (0) over T (0), which equals the e-value ev(H 0 ) in favour of H 0 : δ = 0. Figure 1b shows the same situation, but now the reference function is selected as a wide Cauchy prior C(0, 1), so that the surprise function becomes where c(δ) is the p.d.f. of the C(0, 1) Cauchy distribution.Although the situation seems similar to figure 1a, the scaling on the y-axis now is different.Also, the evidence has changed based on the new surprise function and the interpretation of the surprise function has changed, too.While in figure 1a, the surprise function could be interpreted as the posterior distribution, now it is interpreted as follows: If one assumes a Cauchy prior C(0, 1) on the effect size δ, then parameters with a surprise function value s(δ) ≥ 1 can be interpreted as being corroborated by the data.Parameter values with a surprise function s(δ) < 1 are interpreted as not being corroborated by the data.Pereira and Stern (1999) formally defined the evalue ev(H 0 ) in support of H 0 as ev(H 0 ) := 1 − ev(H 0 ) ( 6) but notice that one can not interpret this value as evidence against H 1 .This can be attributed to the fact that H 1 is not even a sharp hypothesis, see Definition 2.2 in Pereira et al. (2008).
It is crucial to note that it is not possible to utilise the e-value ev(H 0 ) to confirm the null hypothesis H 0 (Kelter, 2020a).However, the FBST can be generalized in to an extended framework which then allows for hypothesis confirmation and itself is an active topic of ongoing research (Esteves et al., 2019).Additionally, the e-value ev(H 0 ) can be used to reject H 0 if ev(H 0 ) is sufficiently small based on asymptotic arguments (Pereira et al., 2008, Section 5).Pereira et al. (2008) showed that the distribution of the evalue is a Chi-square distribution where M is the posterior mode calculated over the entire parameter space Θ and m is the posterior maximum over Θ H0 .The p-value associated with the Bayesian evidence in support of H 0 is then calculated as the superior tail of the χ 2 density with k − h degrees of freedom, starting from −2λ(m 0 ).Here, k is the dimension of the parameter space Θ and h is the dimension of the null set Θ H0 .The quantity m 0 is the observed value and λ(t) = ln l(t) is the logarithm of the relative likelihood function, where l(t) = L(t)/L(M ) is the relative likelihood.Denoting F k−h as the Chi-square distribution's cumulative distribution function with k − h degrees of freedom (F k analogue), the p-value associated with the Bayesian e-value ev(H 0 ) is then computed as This latter p-value has a frequentist interpretation.The p-value based on equation ( 7) can be expressed as and can be interpreted as a Bayesian significance value which quantifies the probability of obtaining ev(H 0 ) or even less evidence in support of the null hypothesis H 0 .Consequently, after observing m 0 and M 0 one only needs to calculate the euclidian distance d 0 = ||m 0 − M 0 || 2 and the value of the χ 2 k distribution's cumulative distribution function of this distance is the corresponding p-value.Based on a threshold (like 0.05) one can decide to reject the null hypothesis H 0 : θ = θ 0 or not.
However, if a p-value is required which is closest to the frequentist p-value in interpretation, one should use the standardized e-value sev(H 0 ), as defined in (Borges and Stern, 2007, Section 2.2) and in (Pereira and Stern, 2020, Section 3.3).The standardized e-value is defined as: Here, F −1 k is the quantile function of the cumulative distribution function of the χ 2 k distribution with k degrees of freedom.sev(H 0 ) can, as a consequence, be interpreted as the probability of obtaining less evidence than ev(H 0 ) against the null hypothesis H 0 .Defining sev(H 0 ) can then be interpreted as the probability of obtaining ev(H 0 ) or more evidence against H 0 , which is closely related to the interpretation of a frequentist p-value.However, the p-value operates in the sample space while the standardized e-value operates in the parameter space.The standardized evalue can be used as a Bayesian replacement of the frequentist p-value, while being very similar in interpretation.For theoretical properties of sev(H 0 ) see Borges and Stern (2007) and Pereira and Stern (2020).
In the examples below, the Bayesian evidence against H 0 , the e-value ev(H 0 ) is reported and also the standardized e-values sev(H 0 ) are given.

Overview and functionality of the fbst package
The centerpiece of the fbst package is the fbst() function, which is used to perform the FBST.In addition to the fbst() function, the package provides customised summary() and plot() functions which allow users to print the results of a FBST or obtain a visualisation of their results to communicate and share the results.The fbst() function has the following structure: Here, posteriorDensityDraws needs to be a numeric vector holding the posterior parameter draws obtained via MCMC or any other numerical method of choice.1The argument nullHypothesisValue is the value specified in the null hypothesis H 0 : θ = θ 0 , and dimensionTheta is the dimension of the parameter space Θ. dimensionNullset is the dimension of the null set Θ H0 , and FUN and par are additional arguments which only need to be specified when a userdefined reference function r(θ) is desired.In general, FUN should be the name of the reference function which should be used and par should be a list of parameters which this reference function utilises (e.g. the location and scale parameters when the reference function is a Cauchy prior).Details will be given in the examples below.
The fbst() function returns an object of the class fbst, which stores several useful details and the results of the conducted FBST.To obtain a concise summary of the FBST, the summary() function of the class fbst can be used.To visualise the FBST, the plot() function of the fbst class can be used.Details are provided in the examples below.
From an algorithmic perspective, the fbst package proceeds via the following steps when computing the e-value via the fbst() function: 1. Based on the posterior parameter samples posteriorDensityDraws, the posterior density p(θ|x) is estimated via a Gaussian kernel density estimator, resulting in a posterior density estimate p(θ|x).The Gaussian kernel is used due to well-known Bayesian asymptotics of posterior distributions, the Bernstein-von-Mises theorem (Held and Sabanés Bové, 2014).2. Based on this posterior density estimate p(θ|x), the surprise function s(θ) is estimated (i) as the posterior density estimate p(θ|x) if no arguments FUN and par are supplied so that a flat reference function r(θ) = 1 is used as default, or (ii) as the ratio p(θ|x)/r(θ) if arguments FUN and par are supplied.The result is a surprise function estimate ŝ(θ).3. The surprise function estimate ŝ(θ) is evaluated at the null hypothesis value supplied via the argument nullHypothesisValue, resulting in the value ŝ0 .4. The e-value ev(H 0 ) is computed via numerical integration of the posterior density estimate p(θ|x) over the tangential set T (H 0 ), which is determined via a linear search algorithm on the vector posteriorDensityDraws by including all values θ which fulfill the condition ŝ(θ) > ŝ0 . 5. The p-value associated with the e-value ev(H 0 ) in favour of the null hypothesis H 0 and the standardized e-values sev(H 0 ) are computed.
In summary, the FBST is based only on simple numerical optimization and integration which makes it a computationally cheap option.This is a benefit, in particular, when the parameter space Θ is highdimensional Pereira and Stern (2020); Stern (2003); Kelter (2020a).Also, the presence of nuisance parameters does not trouble the computation unlike it is the case for example for the Bayes factor, as computing the marginal likelihoods can quickly become difficult then (Stern, 2003).
Example 1: Two-sample Bayesian t-test As a preliminary note, all analyses can be reproduced by following the provided code.2 .To demonstrate how to use the fbst package, we start with the two-sample t-test, a widely used statistical model in the cognitive sciences (Nuijten et al., 2016).We use the two-sample Bayesian t-test of Rouder et al. (2009) together with simulated data.The recommended medium Cauchy prior C(0, √ 2/2) was assigned to the effect size δ.Observations in the first group were simulated as N (0, 1.7), and observations belonging to the second group were generated according to the N (0.8, 3) distribution.As a consequence, the resulting true effect size δ according to Cohen (1988) is given as which equals a small effect size.The code to simulate the data is given in listing 2. The corresponding Bayes factor BF 10 for the alternative hypothesis H 1 : δ = 0 against the null hypothesis H 0 : δ = 0 is given as BF 10 = 0.91, which does not indicate evidence worth mentioning according to Jeffreys (1961) or van Doorn et al. (2019).The slight favour towards H 0 can be attributed to the medium Cauchy prior used, which centres the prior probability mass closely around small effect sizes (and no effect, too).Figure 2 shows a prior-posterior plot for the example.The code to compute the Bayes factor is given in listing 3. To perform the FBST and compute the e-value, we first install and load the R package from CRAN by executing the code in listing 4. Note that in the example, the parameter space Θ consists of three parameters: The mean µ 1 in the first group, the mean µ 2 in the second group, and the standard deviation σ 2 .As a consequence, the argument dimensionTheta is therefore set to dimensionTheta=3.
The null set Θ H0 consists of the set {µ 1 = µ 2 , σ 2 }, which is two-dimensional so that dimensionNullset = 2.The object stored in the variable resFlatSim is an object of the class fbst, which stores several values used in the summary() and plot() functions of the package.These are available to communicate and visualise the results of the FBST.For example, we can access the e-value ev(H 0 ) as follows (see listing 5): Listing 5 Example 1 -The Bayesian e-value against the null hypothesis of no effect 1 re sFl atS im@e Val ue 2 [1] 0.8305998 Instead of accessing each attribute manually, to obtain a summary of the FBST and print the relevant quantities the summary() function of the fbst package provides a more convenient option: Based on the results, we can see that there is some evidence against the null hypothesis according to the Bayesian e-value ev(H 0 ) against H 0 (compare equation ( 5)).The corresponding p-value ev 0 ≈ 0.146 is not significant if a standard threshold of 0.05 is used, but the standardized e-value sev(H 0 ) ≈ 0.025 < 0.05 is.Note that when a p-value is used for hypothesis testing, it is recommended to use the standardized e-value Borges and Stern (2007); Pereira and Stern (2020), so one would reject the null hypothesis H 0 : δ = 0 in this case.However, it is also possible to use only the Bayesian evidence ev(H 0 ) against H 0 without any p-value to quantify the evidence continuously.
To visualise the results, we use the plot() function of the fbst package: Listing 7 Example 1 -Hypothesis testing via the Bayes factor plot ( resFlatSim ) The result is shown in figure 3a: The blue shaded area under the surprise function (which is by default the posterior distribution, that is, a flat reference function r(δ) = 1 is used by default by the fbst() function) is the Bayesian evidence against H 0 , the e-value ev(H 0 ) ≈ 0.83 (compare listing 6).The red shaded area is the e-value ev(H 0 ) in favour of H 0 , which is ev(H 0 ) ≈ 1 − 0.83 = 0.17.Instead of a flat reference function r(δ) = 1, one could also use a more reasonable prior distribution.For example, as small to medium effect sizes are to be expected in the cognitive sciences and psychology, Rouder et al. (2009) recommended to use a medium Cauchy prior C(0, √ 2/2) as a default prior on the effect size.To see if parameter values δ have been corroborated (compared to this prior assumption) by observing the data, we can use this prior as the reference function r(δ) = C(0, √ 2/2), and the resulting surprise function is shown in figure 3b.The code to produce the FBST based on a Cauchy reference density is given in listing 8: There, the FUN argument is supplied with the name of the density to be used and the par argument is supplied with a list of arguments for this density.As the Cauchy distribution has a location and scale parameter, we supply these here.Notice that the blue point which indicates the surprise function value s(0) of the null hypothesis parameter δ = 0 is larger than one.This means that the null hypothesis value has been corroborated by the data.However, all parameter values in the tangential set have been corroborated even more by the data than the null value δ = 0.
Based on the continuous quantification, there is again strong evidence against the null hypothesis when changing the reference function to a medium Cauchy prior: More than 90% of the posterior distribution's parameter values attain a larger surprise function value than the null hypothesis value.The resulting standardized e-value sev(H 0 ) is also significant.
Example 2: Directional two-sample Bayesian t-test Example 1 showed how to apply the FBST in the setting of the Bayesian two-sample t-test.Example 2 is a slight modification of Example 1. Instead of testing a two-sided hypothesis, we now turn to directional hypotheses and show how these can easily be tested via the fbst package, too.We use data of Moore et al. (2012), which provides the reading performance of two groups of pupils: One control group and a treatment group which was given directed reading activities.The data are freely available in the built-in data library of the open-source software JASP 3 .We test the hypothesis H 0 : δ < 0, which is equivalent to the hypothesis H 0 : µ 1 < µ 2 , where the measured quantity is the performance of pupils in the degree of reading power test (DRP) (Moore et al., 2012).
First, we save the data in a .csv-file(which is called DirectedReadingActivities.csv in listing 9), set the working directory and load the data 4 : Listing 9 Example 2 -Loading the data 1 setwd ( ' ... ') # Change to where the data are stored on your machine 2 library ( dplyr ) 3 dra = read .csv ( " D i r e c t e d R e a d i n g A c t i v i t i e s . csv " , sep = " ," ) 4 head ( dra ) 5 6 id group g drp 7 1 1 Treat 0 24 3 See www.jasp-stats.org 4The data set is also provided as a .csv-fileat the OSF repository https://osf.io/u6xnc/.
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 0.0 0.2 0.4 0.6 0. The dimensions of Θ and Θ H0 are identical to Example 1, and the Bayesian e-value ev(H 0 ) ≈ 0.986 expresses strong evidence against the null hypothesis H 0 : δ = 0. Also, the standardized e-value sev(H 0 ) ≈ 0.001 < 0.05 is significant and leads to the same conclusion if a threshold of 0.05 is applied.The results are visualised in figure 4. Figure 4a shows the FBST when a wide half-Cauchy prior C + (0, 1) is used as the reference function r(δ) (Rouder et al., 2009) 5 .Figure 4a is produced by the code in listing 12, where the additional parameter rightBoundary = 0 needs to be added to inform the plot() function that a one-sided hypothesis was used.Should the alternative be H 1 : δ > 0, one would supply the argument leftBoundary = 0 to the plot() function.Based on the continuous quantification of evidence against H 0 in form of ev(H 0 ) and the standardized e-value sev(H 0 ) one would reject the null hypothesis H 0 : δ = 0 in favour of the alternative H 1 : δ < 0. That is, the performance in the treatment group is better than in the control group which was not given directed reading activities.

Example 3: Bayesian logistic regression
As a third example, we demonstrate how to use the FBST via the fbst package in the context of the Bayesian logistic regression model (McElreath, 2020).Notice that while we focus on the standard logistic model here, the procedure is applicable to any regression model of interest like probit or linear regression models.We use data from the Western Collaborative Group Study (WCGS) of Rosenman et al. (1975), in which 3154 healthy young men aged 39 − 59 from the San Francisco area were assessed for their personality type.All were free from coronary heart disease at the start of the research.Eight and a half years later change in this situation was recorded.We use a subset of n = 3140 participants, where 14 participants have been excluded because of incomplete data.The data set is freely available in the faraway R package, so we first load and prepare the data as shown in listing 13.
Listing 13 Example 3 -Loading the data For illustration purposes, we use a Bayesian logistic regression model which studies the influence of the covariates age, height, weight, systolic blood pressure (sdp), diastolic blood pressure (dbp), fasting serum cholesterol (chol) and the number of cigarettes smoked per day (cigs) on the outcome chronic heart disease (yes / no, variable chd) stored in the response variable chd.
The model is fit via the Hamiltonian Monte Carlo sampler Stan (Carpenter et al., 2017;Kelter, 2020c) which uses the No-U-Turn sampler of Hoffman and Gelman (2014) to sample from the posterior distribution.We obtain the posterior distribution of the intercept and the seven regression coefficients β 1 , ..., β 7 , belonging to the six covariates included in the model.The default weakly informative σ ∼ exp(1) prior is assigned to the standard deviation σ, see Gabry and Goodrich (2020).We use the rstanarm package (Goodrich et al., 2020) for fitting the Bayesian logistic regression model, and the code to prepare the data for Stan is given in listing 14.The standard weakly informative prior distribution β j ∼ N (0, 2.5) is assigned to the regression coefficients β j , j = 1, ..., 7, and the intercept β 0 is assigned the weakly informative default prior β 0 ∼ N (0, 10) recommended by Gabry and Goodrich (2020).Listing 15 shows the code to fit the model via the rstanarm package, summarise and plot the results.plot ( post _ m1 , " areas " , prob = 0.95 , prob _ outer = 1 , pars = c ( " age " , " height " , " weight " , " sdp " , " dbp " , " chol " , " cigs " ) ) Figure 5 shows the marginal posterior distributions of the regression coefficients β j for the Bayesian logistic regression model in Example 3. To compute the FBST on the regression coefficients, we need to extract the posterior MCMC sample first, as shown in listing 16.For illustration purposes, we conduct the FBST on the regression coefficient belonging to the covariate weight.The FBST is computed using a normal prior N (0, 2.5) as reference function, which was also used to fit the model.This way, the surprise function quantifies which parameter values β j have been corroborated more by observing the data than the null value β j = 0.The results are also shown in figure 6, which is produced via the plot() function call in listing 16.Based on the standardized e-value sev(H 0 ) ≈ 0.0000267 and the Bayesian evidence against H 0 , the e-value ev(H 0 ) ≈ 0.9759 one would reject the null hypothesis H 0 : β j = 0.

Discussion
This paper introduced the R package fbst for computing the Full Bayesian Significance Test and the evalue for testing a sharp hypothesis against the alternative.The conceptual approach and the statistical theory of the FBST were detailed, and three examples of statistical models frequently used in psychology and the cognitive sciences highlighted how the FBST can be computed in practice via the fbst R package.It was shown that both one-sided and twosided hypotheses can be tested with the fbst package.The package's core function fbst() requires only a posterior MCMC sample so it should be applicable to a wide range of statistical models used in the cognitive sciences and psychology.The examples demonstrated that it is simple to combine the FBST via the fbst package with widely used libraries like rstanarm (Goodrich et al., 2020) or the BayesFactor package (Morey and Rouder, 2018).
The provided summary and plot functions in the package allow intuitive use and produce appealing visualisation of the FBST results which simplifies sharing and communication of the results with colleagues.We omitted simulation studies in this paper, because these were recently conducted by Kelter (2020a) to which the interested reader is referred.
For more details on the theoretical properties of the FBST, we also refer the reader to Pereira and Stern (2020).
To conclude, we direct attention to some limitations and possible extensions of the FBST and the fbst package presented in this paper.First, the fbst package is widely applicable but this strength can also be interpreted as a limitation.The fbst package requires a posterior distribution which has been derived analytically or numerically to conduct the FBST and compute the e-value, so it is not a standalone solution.
Second, the core functionality in the current form is restricted to computing, summarising and visualising the FBST.Future extensions could include more detailed analysis results like robustness checks depending on the reference function used, see van Doorn et al. (2019).Also, in its current form the package uses only posterior MCMC draws, and fu--0.0050.000 0.005 0.010 0.015 0.020 0.025 0 100 300 500 Parameter surprise function density Fig. 6 Visualisation of the FBST for H 0 : β j = 0 against H 1 : β j = 0 for the regression coefficient of the covariate weight in the Bayesian logistic regression model for the WCGS study ture versions could provide the option to provide the posterior as a closed-form function.Another option to extend the functionality would be to make various algorithms available to estimate the posterior density based on the posterior draws: By now, only Gaussian kernel density estimation is used.In small sample situations the asymptotics of Bayesian posterior distributions guaranteed by the Bernsteinvon-Mises theorem can be questionable and other approaches like spline-based interpolation or non-Gaussian kernels may be more useful.
Third, while the standardized e-values may be used as a replacement of frequentist p-values, they are also based on asymptotic arguments and future research is needed to study the behaviour of the standardized e-values sev(H 0 ) for small samples.This is why we recommend a continuous interpretation of the Bayesian e-value ev(H 0 ) over a thresholdoriented interpretation via standardized e-values sev(H 0 ).
In closing, it must be emphasized that we do not argue against the appropriate use of p-values, Bayes factors or any other suitable method of hypothesis testing.However, the ongoing debate about the concept of statistical significance shows that it is useful to explore existing alternatives for statistical hypothesis testing and investigate the relationships between these approaches both from a theoretical and practical perspective (Berger and Sellke, 1987;Makowski et al., 2019b;Liao et al., 2020).The fbst R package introduced in this paper could contribute in particular to the former, as simulation studies can easily be carried out by employing the package, see for example Kelter (2020a).
There is much value in testing a sharp null hypothesis against its alternative in the cognitive sciences and psychology (Berger et al., 1994(Berger et al., , 1997;;Rouder et al., 2009).While there are also other useful approaches such as equivalence testing -see Lakens (2017); Lakens et al. (2018); Kruschke and Liddell (2018b); Kruschke (2018) -the FBST has shown to be an attractive alternative to NHST and p-values with desirable theoretical and practical properties (Kelter, 2020a;Pereira and Stern, 2020;Esteves et al., 2019).It is hoped that this package will be useful to researchers from the cognitive sciences and psychologists who are interested in a fully Bayesian alter-native to null hypothesis significance testing which requires little methodological changes, but offers all the benefits of a fully Bayesian data analysis.
Fig.1The FBST and the e-value ev(H 0 ) against H 0 : δ = 0 in a Bayesian two-sample t-test, where δ is the effect size.(a): A flat reference function r(δ) = 1 is used, and the solid line is the resulting posterior distribution p(δ|x) after observing the data.The supremum over the null set s * = 0 is visualised as the blue point.The blue shaded area corresponds to the cumulative surprise function W (0), which is the integral over the tangential set T (0) of H 0 : δ = 0.This is the e-value ev(H 0 ) against H 0 .The red area is the integral W (0) over T (0), and equals the e-value ev(H 0 ) in favour of H 0 : δ = 0. (b): The same situation as in (a), but now a Cauchy C(0, 1) prior has been used as reference function r(δ).

Listing 4
Example 1 -Hypothesis testing via the Bayes factor install .packages ( " fbst " ) library ( fbst ) resFlatSim = fbst ( p o s t e r i o r D e n s i t y D r a w s = p , n u l l H yp o t h e s i s V a l u e = 0 , dimensionTheta = 3 , dimensionNull set = 2)

Listing 8
Example 1 -The FBST using a medium Cauchy prior as reference function resMediumSim = fbst ( p o s t e r i o r D e n s i t y D r a w s = p , n u l l H yp o t h e s i s V a l u e = 0 , dimensionThe ta = 3 , d i m e n s i o n N ullset = 2 , FUN = dcauchy , par = list ( location = 0 , scale = sqrt (2

9
Fig. 3 (a) Visualisation of the FBST for the Bayesian two-sample t-test in Example 1 using a flat reference function r(δ) = 1; (b) Visualisation of the FBST for the Bayesian two-sample t-test in Example 1 using a medium Cauchy prior as reference function r(δ) = C(0, √ 2/2)

Listing 16
Example 3 -Extracting the posterior MCMC draws, performing the FBST and visualising the result 1 p o s t e r i o r D r a w s M a t r i x = as .matrix ( t e r i o r D e n s i t y D r a w s = weightDraws , n u l l H yp o t h e s i s V al u e = 0 , dimensionTheta = 8 , dimensionNull set = 7 , FUN = dnorm , par = list ( mean = 0

Fig. 5
Fig. 5 Marginal posterior distributions of the regression coefficients β j in the Bayesian logistic regression model in Example 3