Investigating the relationship between the Bayes factor and the separation of credible intervals

Wei, Zhengxiao; Nathoo, Farouk S.; Masson, Michael E. J.

doi:10.3758/s13423-023-02295-1

Investigating the relationship between the Bayes factor and the separation of credible intervals

Theoretical/Review
Published: 11 May 2023

Volume 30, pages 1759–1781, (2023)
Cite this article

Psychonomic Bulletin & Review Aims and scope Submit manuscript

264 Accesses
1 Altmetric
Explore all metrics

Abstract

We examined the relationship between the Bayes factor and the separation of credible intervals in between- and within-subject designs under a range of effect and sample sizes. For the within-subject case, we considered five intervals: (1) the within-subject confidence interval of Loftus and Masson (1994); (2) the within-subject Bayesian interval developed by Nathoo et al. (2018), whose derivation conditions on estimated random effects; (3) and (4) two modifications of (2) based on a proposal by Heck (2019) to allow for shrinkage and account for uncertainty in the estimation of random effects; and (5) the standard Bayesian highest-density interval. We derived and observed through simulations a clear and consistent relationship between the Bayes factor and the separation of credible intervals. Remarkably, for a given sample size, this relationship is described well by a simple quadratic exponential curve and is most precise in case (4). In contrast, interval (5) is relatively wide due to between-subjects variability and is likely to obscure effects when used in within-subject designs, rendering its relationship with the Bayes factor unclear in that case. We discuss how the separation percentage of (4), combined with knowledge of the sample size, could provide evidence in support of either a null or an alternative hypothesis. We also present a case study with example data and provide an R package ‘rmBayes’ to enable computation of each of the within-subject credible intervals investigated here using a number of possible prior distributions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A new criterion for assessing discriminant validity in variance-based structural equation modeling

Article Open access 22 August 2014

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations

Article Open access 01 April 2016

The Trustworthiness of Content Analysis

Data availability

The data used in the analyses are available via the Open Science Framework at https://osf.io/x2pvw/.

Code availability

All R code is available via the Open Science Framework at https://osf.io/x2pvw/.

Notes

Other types of averages can also be used. The root mean square is preferable because it connects the pooled confidence interval width ($l={t}_{1-\frac{\alpha }{2},\kern0.37em {n}_1+{n}_2-2}^{\ast}\cdot {s}_p\sqrt{\frac{1}{n_1}+\frac{1}{n_2}}$) for the difference between means in a two-sample t-test to the confidence interval widths (${l}_i={t}_{1-\frac{\alpha }{2},\kern0.37em {n}_1+{n}_2-2}^{\ast}\cdot {s}_p\sqrt{\frac{1}{n_i}}$) for the population means in an unbalanced one-way ANOVA with two conditions. ${l}^2={l}_1^2+{l}_2^2$, and s_p is the pooled estimate of the common standard deviation.
Faulkenberry and Brennan (2022) extended Equation 9 to a closed-form expression of the Pearson Bayes factor for within-subject designs simply by substituting N^∗ = n(a − 1) for the total number of independent observations (Masson, 2011, p. 682).
The Stan syntax target += -log(sigma); in place of target += -2*log(sigma); has been implemented in version 0.1.15 and later of the ‘rmBayes’ R package to accurately reflect the Jeffreys prior in Equation 2. Regardless of which syntax is used, there is little difference in graphical results, which can be seen as an example of a sensitivity analysis for different possible priors. ‘rmBayes’ 0.1.15 was used for the computations reported in this article.
By calling rmHDI(recall.long, iter = 2e4, seed = 277)$width, macOS may return 0.5613043, Intel-based macOS may return 0.5601921, Compute Canada Cedar may return 0.5600443, and Windows may return 0.5589209.
As a sufficient but not necessary condition for conducting repeated-measures ANOVA, the compound symmetry assumption states that all conditions have equal population variance, and all pairs of conditions have equal covariance. Hence, compound symmetry is a restrictive form of circularity. See remarks in Cousineau (2019, p. 232).
At the time of writing, the ‘BayesFactor’ R package by Morey and Rouder (2022) implemented Equation 13 for multiway within-subject designs, but van den Bergh et al. (2022) have realized the misspecification and started to update the functions accordingly. See also Kruschke (2014, p. 606-608). Changes will not affect the one-way models used for simulations in this article.

References

Armitage, P., Berry, G., & Matthews, J. N. S. (2002). Statistical methods in medical research (4th ed.). Bodmin, UK: Blackwell Science. https://doi.org/10.1002/9780470773666
Bartlett, M. S. (1957). A comment on D. V. Lindley’s statistical paradox. Biometrika, 44, 533–534. https://doi.org/10.1093/biomet/44.3-4.533
Bub, D. N., Masson, M. E., & van Noordenne, M. (2021). Motor representations evoked by objects under varying action intentions. Journal of Experimental Psychology: Human Perception and Performance, 47, 53–80.
PubMed Google Scholar
Campbell, H., & Gustafson, P. (2021). re: Linde et al. (2021) - The Bayes factor, HDI-ROPE and frequentist equivalence testing are actually all equivalent. ArXiv. 1–22. https://doi.org/10.48550/arXiv.2104.07834
Carvalho, C. M., Polson, N. G., & Scott, J. G. (2010). The horseshoe estimator for sparse signals. Biometrika, 97, 465–480.
Article Google Scholar
Casella, G., Ghosh, M., Gill, J., & Kyung, M. (2010). Penalized regression, standard errors, and Bayesian lassos. Bayesian Analysis, 5, 369–411.
Article Google Scholar
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). New York: Routledge. https://doi.org/10.4324/9780203771587
Congdon, P. D. (2019). Bayesian hierarchical models with applications using R (2nd ed.). New York: Chapman and Hall/CRC. https://doi.org/10.1201/9780429113352
Cousineau, D. (2019). Correlation-adjusted standard errors and confidence intervals for within-subject designs: A simple multiplicative approach. The Quantitative Methods for Psychology, 15, 226–241.
Article Google Scholar
Craiu, R. V., Gustafson, P., & Rosenthal, J. S. (2022). Reflections on Bayesian inference and Markov chain Monte Carlo. The Canadian Journal of Statistics, 50, 1213–1227.
Article Google Scholar
Cumming, G. (2014). The new statistics: Why and how. Psychological Science, 25, 7–29.
Article PubMed Google Scholar
Dienes, Z. (2021). Obtaining evidence for no effect. Collabra. Psychology, 7, 1–15.
Google Scholar
Eich, E. (2014). Business not as usual. Psychological Science, 25, 3–6.
Article PubMed Google Scholar
Etz, A., & Vandekerckhove, J. (2016). A Bayesian perspective on the reproducibility project: Psychology. PLoS ONE, 11, 1–12.
Article Google Scholar
Evett, I. W. (1987). Bayesian inference and forensic science: Problems and perspectives. Journal of the Royal Statistical Society, 36, 99–105.
Google Scholar
Faul, F., Erdfelder, E., Lang, A.-G., & Buchner, A. (2007). G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39, 175–191.
Article PubMed Google Scholar
Faulkenberry, T. J. (2021). The Pearson Bayes factor: An analytic formula for computing evidential value from minimal summary statistics. Biometrical Letters, 58, 1–26.
Article Google Scholar
Faulkenberry, T. J., & Brennan, K. B. (2022). Computing analytic Bayes factors from summary statistics in repeated-measures designs. ArXiv., 1–25. https://doi.org/10.48550/arXiv.2209.08159
Franz, V. H., & Loftus, G. R. (2012). Standard errors and confidence intervals in within-subjects designs: Generalizing Loftus and Masson (1994) and avoiding the biases of alternative accounts. Psychonomic Bulletin & Review, 19, 395–404.
Article Google Scholar
Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A. & Rubin, D. B. (2013). Bayesian data analysis (3rd ed.). New York: Chapman and Hall/CRC. https://doi.org/10.1201/b16018
Greenhouse, S. W., & Geisser, S. (1959). On methods in the analysis of profile data. Psychometrika, 24, 95–112.
Article Google Scholar
Heck, D. W. (2019). Accounting for estimation uncertainty and shrinkage in Bayesian within-subject intervals: A comment on Nathoo, Kilshaw, and Masson (2018). Journal of Mathematical Psychology, 88, 27–31.
Article Google Scholar
Hoekstra, R., Morey, R. D., Rouder, J. N., & Wagenmakers, E.-J. (2014). Robust misinterpretation of confidence intervals. Psychonomic Bulletin & Review, 21, 1157–1164.
Article Google Scholar
Hu, C., Wang, F., Guo, J., Song, M., Sui, J., & Peng, K. (2016). The replication crisis in psychological research. Advances in Psychological Science, 24, 1504–1518.
Article Google Scholar
Huynh, H., & Feldt, L. S. (1976). Estimation of the Box correction for degrees of freedom from sample data in randomised block and split-plot designs. Journal of Educational Statistics, 1, 69–82.
Article Google Scholar
Jaynes, E. T., & Kempthorne, O. (1976). Confidence intervals vs Bayesian intervals. Foundations of Probability Theory, Statistical Inference, and Statistical Theories of Science, 6b, 175–257. https://doi.org/10.1007/978-94-010-1436-6_6
Jeffreys, H. (1935). Some tests of significance, treated by the theory of probability. Mathematical Proceedings of the Cambridge Philosophical Society, 31, 203–222.
Article Google Scholar
Jeffreys, H. (1936). Further significance tests. Mathematical Proceedings of the Cambridge Philosophical Society, 32, 416–445.
Article Google Scholar
Jeffreys, H. (1946). An invariant form for the prior probability in estimation problems. Proceedings of the Royal Society of London. Series A, Mathematical and Physical Sciences, 186, 453–461.
PubMed Google Scholar
Jeffreys, H. (1961). Theory of probability (3rd ed.). Oxford, UK: Oxford University Press. https://global.oup.com/academic/product/theory-of-probability-9780198503682
Jusczyk, P. W., Houston, D. M., & Newsome, M. (1999). The beginnings of word segmentation in English-learning infants. Cognitive Psychology, 39, 159–207.
Article PubMed Google Scholar
Kass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association, 90, 773–795.
Article Google Scholar
Kline, R. B. (2013). Beyond significance testing: Statistics reform in the behavioral sciences (2nd ed.). Washington, D.C.: American Psychological Association. https://doi.org/10.1037/14136-000
Kotz, S., & Nadarajah, S. (2004). Multivariate t-distributions and their applications. Cambridge University Press. https://doi.org/10.1017/CBO9780511550683
Book Google Scholar
Kruschke, J. K. (2014). Doing Bayesian data analysis: A tutorial with R, JAGS, and Stan (2nd ed.). London, UK: Academic Press. https://doi.org/10.1016/B978-0-12-405888-0.09999-2
Kruschke, J. K. (2018). Rejecting or accepting parameter values in Bayesian estimation. Advances in Methods and Practices in Psychological Science, 1, 270–280.
Article Google Scholar
Kruschke, J. K. (2021). Bayesian analysis reporting guidelines. Nature Human Behaviour, 5, 1282–1291.
Article PubMed PubMed Central Google Scholar
Lawrence, M. A. (2016). ez: Easy analysis and visualization of factorial experiments. R package version 4.4-0. https://cran.r-project.org/package=ez
Lee, M. D., & Wagenmakers, E.-J. (2014). Bayesian cognitive modeling: A practical course. Cambridge University Press. https://doi.org/10.1017/CBO9781139087759
Book Google Scholar
Liang, F., Paulo, R., Molina, G., Clyde, M. A., & Berger, J. O. (2008). Mixtures of g priors for Bayesian variable selection. Journal of the American Statistical Association, 103, 410–423.
Article Google Scholar
Linde, M., Tendeiro, J., Selker, R., Wagenmakers, E.-J., & van Ravenzwaaij, D. (2021). Decisions about equivalence: A comparison of TOST, HDI-ROPE, and the Bayes factor. Psychological Methods. 1–16. https://doi.org/10.1037/met0000402
Lindley, D. V. (1957). A statistical paradox. Biometrika, 44, 187–192. https://doi.org/10.2307/2333251
Article Google Scholar
Loftus, G. R., & Masson, M. E. J. (1994). Using confidence intervals in within-subject designs. Psychonomic Bulletin & Review, 1, 476–490.
Article Google Scholar
Lovric, M. M. (2020). Conflicts in Bayesian statistics between inference based on credible intervals and Bayes factors. Journal of Modern Applied Statistical Methods, 18, 1–27.
Article Google Scholar
Ly, A., Boehm, U., Heathcote, A., Turner, B. M., Forstmann, B., Marsman, M., & Matzke, D. (2017). A flexible and efficient hierarchical Bayesian approach to the exploration of individual differences in cognitive-model-based neuroscience. Computational Models of Brain and Behavior, 467–479. https://doi.org/10.1002/9781119159193.ch34
Ly, A., Raj, A., Etz, A., Marsman, M., Gronau, Q. F., & Wagenmakers, E.-J. (2018). Bayesian reanalyses from summary statistics: a guide for academic consumers. Advances in Methods and Practices in Psychological Science, 1, 367–374.
Article Google Scholar
Ly, A., Verhagen, J., & Wagenmakers, E.-J. (2016). Harold Jeffreys’s default Bayes factor hypothesis tests: Explanation, extension, and application in psychology. Journal of Mathematical Psychology, 72, 19–32.
Article Google Scholar
Maruyama, Y., & George, E. I. (2011). Fully Bayes factors with a generalized g-prior. The Annals of Statistics, 39, 2740–2765.
Article Google Scholar
Masson, M. E. J. (2011). A tutorial on a practical Bayesian alternative to null-hypothesis significance testing. Behavior Research Methods, 43, 679–690.
Article PubMed Google Scholar
Masson, M. E. J., & Loftus, G. R. (2003). Using confidence intervals for graphically based data interpretation. Canadian Journal of Experimental Psychology, 57, 203–220.
Article PubMed Google Scholar
Morey, R. D. (2015a). Multiple comparisons with BayesFactor, Part 1. R-Bloggers. https://www.r-bloggers.com/2015/01/multiple-comparisons-with-bayesfactor-part-1/
Morey, R. D. (2015b). Multiple comparisons with BayesFactor, Part 2 - Order restrictions. BayesFactor. https://bayesfactor.blogspot.com/2015/01/multiple-comparisons-with-bayesfactor-2.html
Morey, R. D., Romeijn, J. W., & Rouder, J. N. (2016). The philosophy of Bayes factors and the quantification of statistical evidence. Journal of Mathematical Psychology, 72, 6–18.
Article Google Scholar
Morey, R. D., & Rouder, J. N. (2022). BayesFactor: Computation of Bayes factors for common designs. R package version 0.9.12-4.4. https://cran.r-project.org/package=BayesFactor
Morey, R. D., Rouder, J. N., Pratte, M. S., & Speckman, P. L. (2011). Using MCMC chain outputs to efficiently estimate Bayes factors. Journal of Mathematical Psychology, 55, 368–378.
Article Google Scholar
Nathoo, F. S., Kilshaw, R. E., & Masson, M. E. J. (2018). A better (Bayesian) interval estimate for within-subject designs. Journal of Mathematical Psychology, 86, 1–9.
Article Google Scholar
Nathoo, F. S., & Masson, M. E. J. (2016). Bayesian alternatives to null-hypothesis significance testing for repeated-measures designs. Journal of Mathematical Psychology, 72, 144–157.
Article Google Scholar
Raftery, A. E. (1995). Bayesian model selection in social research. Sociological Methodology, 25, 111–163.
Article Google Scholar
Rouder, J. N., Morey, R. D., Speckman, P. L., & Province, J. M. (2012). Default Bayes factors for ANOVA designs. Journal of Mathematical Psychology, 56, 356–374.
Article Google Scholar
Rouder, J. N., Morey, R. D., Verhagen, J., Swagman, A. R., & Wagenmakers, E.-J. (2017). Bayesian analysis of factorial designs. Psychological Methods, 22, 304–321.
Article PubMed Google Scholar
Rouder, J. N., Speckman, P. L., Sun, D., Morey, R. D., & Iverson, G. (2009). Bayesian t tests for accepting and rejecting the null hypothesis. Psychonomic Bulletin & Review, 16, 225–237.
Article Google Scholar
Schenker, N., & Gentleman, J. F. (2001). On judging the significance of differences by examining the overlap between confidence intervals. The American Statistician, 55, 182–186.
Article Google Scholar
Stan Development Team (2023). RStan: The R interface to Stan. R package version 2.21.8. https://mc-stan.org/
Urry, H. L., van Reekum, C. M., Johnstone, T., Kalin, N. H., Thurow, M. E., Schaefer, H. S., Jackson, C. A., Frye, C. J., Greischar, L. L., Alexander, A. L., & Davidson, R. J. (2006). Amygdala and ventromedial prefrontal cortex are inversely coupled during regulation of negative affect and predict the diurnal pattern of cortisol secretion among older adults. Journal of Neuroscience, 26, 4415–4425.
Article PubMed Google Scholar
van den Bergh, D., Wagenmakers, E.-J., & Aust, F. (2022). Bayesian repeated-measures ANOVA: An updated methodology implemented in JASP. PsyArXiv. 1-28. 10.31234/osf.io/fb8zn
Vogel, E. K., Woodman, G. F., & Luck, S. J. (2001). Storage of features, conjunctions, and objects in visual working memory. Journal of Experimental Psychology: Human Perception and Performance, 27, 92–114.
PubMed Google Scholar
Wagenmakers, E.-J. (2007). A practical solution to the pervasive problems of p values. Psychonomic Bulletin & Review, 14, 779–804.
Article Google Scholar
Wagenmakers, E.-J. (2022). Approximate objective Bayes factors from p-values and sample size: The $3p\sqrt{n}$ rule. PsyArXiv. 1-50. https://doi.org/10.31234/osf.io/egydq
Wagenmakers, E.-J., Gronau, Q. F., Dablander, F., & Etz, A. (2022). The support interval. Erkenntnis, 87, 589–601.
Article Google Scholar
Wagenmakers, E.-J., Lodewyckx, T., Kuriyal, H., & Grasman, R. (2010). Bayesian hypothesis testing for psychologists: A tutorial on the Savage-Dickey method. Cognitive Psychology, 60, 158–189.
Article PubMed Google Scholar
Wagenmakers, E.-J., & Ly, A. (2023). History and nature of the Jeffreys-Lindley paradox. Archive for History of Exact Sciences, 77, 25–72.
Article Google Scholar
Wang, M., & Liu, G. (2016). A simple two-sample Bayesian t-test for hypothesis testing. The American Statistician, 70, 195–201.
Article Google Scholar
Wang, M., & Sun, X. (2014). Bayes factor consistency for one-way random effects model. Communications in Statistics - Theory and Methods, 43, 5072–5090.
Article Google Scholar
Wei, Z., Nathoo, F. S., & Masson, M. E. J. (2022a). rmBayes: Performing Bayesian inference for repeated-measures designs. R package version 0.1.15. https://cran.r-project.org/package=rmBayes
Wei, Z., Yang, A., Rocha, L., Miranda, M. F., & Nathoo, F. S. (2022b). A review of Bayesian hypothesis testing and its practical implementations. Entropy, 24, 1–15.
Article Google Scholar
Wetzels, R., Matzke, D., Lee, M. D., Rouder, J. N., Iverson, G. J., & Wagenmakers, E.-J. (2011). Statistical evidence in experimental psychology: An empirical comparison using 855 t tests. Perspectives on Psychological Science, 6, 291–298.
Article PubMed Google Scholar
Zellner, A., & Siow, A. (1980). Posterior odds ratios for selected regression hypotheses. Trabajos de Estadística Y de Investigación Operativa, 31, 585–603.
Article Google Scholar

Download references

Acknowledgements

We thank Eric-Jan Wagenmakers for bringing to our attention the potential link between the separation of credible intervals and the Jeffreys-Lindley paradox. We are also grateful for an anonymous referee’s helpful comments on the likelihood principle, the importance of sample size, and evidence for the null hypothesis.

Funding

This work was supported by discovery grants to Farouk S. Nathoo (RGPIN-04044-2020) and Michael E. J. Masson (RGPIN-2015-04773) from the Natural Sciences and Engineering Research Council. Farouk S. Nathoo holds a Tier II Canada Research Chair in Biostatistics for Spatial and High-Dimensional Data.

Author information

Authors and Affiliations

Department of Mathematics and Statistics, University of Victoria, P.O. Box 1700 STN CSC, Victoria, British Columbia, V8W 2Y2, Canada
Zhengxiao Wei & Farouk S. Nathoo
Department of Psychology, University of Victoria, Victoria, British Columbia, Canada
Michael E. J. Masson

Authors

Zhengxiao Wei
View author publications
You can also search for this author in PubMed Google Scholar
Farouk S. Nathoo
View author publications
You can also search for this author in PubMed Google Scholar
Michael E. J. Masson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhengxiao Wei.

Ethics declarations

Conflict of interest

The authors declare no conflicts of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A

Proof of Theorem 1

After some substitutions, the Pearson Bayes factor in Equation 9 becomes

$$\text{P-BF}_{10}=\frac{\Gamma \left(\frac{a+1}{2}+\upgamma \right)\cdot \Gamma \left(\frac{a\left(n-1\right)}{2}\right)}{\Gamma \left(\frac{an-1}{2}\right)\cdot \Gamma \left(1+\upgamma \right)}{\left(1+\frac{{{SS}}_{\mathrm{B}}}{{{SS}}_{\mathrm{W}}}\ \right)}^{\frac{a\left(n-1\right)}{2}-1-\upgamma}.$$

(A1)

By applying Stirling’s formula Γ(y + z) ∼ y^zΓ(y) as y → + ∞, the gamma ratio in Equation A1 becomes $\frac{\Gamma \left(\frac{a\left(n-1\right)}{2}\right)}{\Gamma \left(\frac{an-1}{2}\right)}\sim {\left(\frac{an}{2}\right)}^{\frac{1-a}{2}}$ as n → + ∞.

We calculated the separation for the standard between-subjects confidence interval in Equation A2 as the absolute value of the difference between two sample means over the twofold interval width. Here, we consider only a = 2.

$$Sep=\left|{M}_{1\cdot }-{M}_{2\cdot}\right|/\left(2{t}_{1-\frac{\alpha }{2},\ a\left(n-1\right)}^{\ast}\sqrt{\frac{{{SS}}_{\mathrm{W}}}{n\left(n-1\right)a}}\right).$$

(A2)

As n → + ∞, ${t}_{1-\frac{\alpha }{2},\ a\left(n-1\right)}^{\ast}\sim {z}_{1-\frac{\alpha }{2}}$, the sample means converge to the population means, and (SS_B/SS_W)² is assumed to be the least significant in the exponential series. ${SS}_{\mathrm{B}}=n\overset{a}{\sum\limits_{i=1}}{\left({M}_{i\cdot }-M\right)}^2$ reduces to $\frac{1}{2}n{\left({M}_{1\cdot }-{M}_{2\cdot}\right)}^2$when a = 2. Plugging Equation A2 in the limit below, we obtain

$${\displaystyle \begin{array}{ll} & \underset{n\to +\infty }{\lim}\left[{\left(1+\frac{SS_{\mathrm{B}}}{SS_{\mathrm{W}}}\right)}^{\frac{a\left(n-1\right)}{2}-1-\upgamma}-\exp \left\{{z}_{1-\frac{\upalpha}{2}}^2\cdot {Sep}^2\right\}\right]\\ {}=& \underset{n\to +\infty }{\lim}\left[{\left(1+\frac{SS_{\mathrm{B}}}{SS_{\mathrm{W}}}\right)}^{\frac{a\left(n-1\right)}{2}-1-\upgamma}-\exp \left\{\frac{1}{2}a\left(n-1\right)\frac{SS_{\mathrm{B}}}{SS_{\mathrm{W}}}\right\}\right]\\ {}=& \underset{n\to +\infty }{\lim}\left[{\left(1+\frac{SS_{\mathrm{B}}}{SS_{\mathrm{W}}}\right)}^{\frac{a\left(n-1\right)}{2}-1-\upgamma}-{\left(1+\frac{SS_{\mathrm{B}}}{SS_{\mathrm{W}}}+O\left({\left(\frac{SS_{\mathrm{B}}}{SS_{\mathrm{W}}}\right)}^2\right)\right)}^{\frac{a\left(n-1\right)}{2}}\right]=0.\end{array}}$$

Hence, the asymptotic approximation of the log Pearson Bayes factor is a quadratic function of the separation of the standard confidence interval for population means in balanced one-way between-subjects designs, as the number of subjects goes to infinity. To check the convergence of the limit, we plot the (solid black) fitted line for the relationship between the log JZS-BF₁₀ and squared separation score, along with the (dashed red) analytic line from plugging the known values into the formula of Theorem 1, in Fig. 10. As the sample size increases, the two lines become closer. We still expect some variations between these lines even for very large n because Theorem 1 applies for the Pearson Bayes factor, whereas the quadratic exponential is interpolated for the JZS Bayes factor.

Appendix B

R packages to perform Bayesian inference

The ‘rmBayes’ package performs Bayesian interval estimation for both the homoscedastic and heteroscedastic cases in either between- or within-subject designs that include a single independent variable. The Stan-based R source package installation will take a few minutes because models need to be compiled into dynamic shared objects. We recommend using R version 4.0.1 or later and installing the pre-compiled binary package so users do not have to worry about C++ compiler issues. The relevant commands are:

> install.packages("rmBayes", type = "binary")

> library(rmBayes)

The rmHDI function in ‘rmBayes’ provides multiple methods to construct the credible intervals for population means, with each method based on different sets of priors. The default method implements the NUTS algorithm and constructs the within-subject HDI corresponding to the JZS-HDI case in Table 1. More methods documentation can be viewed on GitHub, https://zhengxiaouvic.github.io/rmBayes/. The following example includes a partial data set, a call to the rmHDI function, and the resulting output. The partial data set is also shown in wide format.

> ## Data are in the long format. 10 subjects. 3 conditions.

> head(recall.long, 2)

Subject Level Response

1 s1 Level1 10

2 s2 Level1 6

> rmHDI(recall.long, whichSubject = "Subject", whichLevel = "Level", whichResponse = "Response", seed = 277) #macOS (Apple chip)

$HDI

lower upper

Level1 10.47101 11.59361

Level2 12.39176 13.51436

Level3 13.55086 14.67346

$`posterior means`

Level1 Level2 Level3

11.03231 12.95306 14.11216

$width

[1] 0.5613014

> ## Same data are in the wide format.

> head(recall.wide, 2)

Level1 Level2 Level3

s1 10 13 13

s2 6 8 8

> rmHDI(data.wide= recall.wide, seed = 277)

An alternative method for computing HDIs is possible using the ‘BayesFactor’ package, which computes Bayes factors for several experimental designs. The anovaBF function can first be used to generate the Bayes factor for a within-subject design.

> library(BayesFactor); set.seed(277)

> anovaBF(Response ~ Level + Subject, data = recall.long, whichRandom = "Subject", iterations = 100000, progress = FALSE)

Bayes factor analysis

--------------

[1] Level + Subject : 36469.12 ±0.32%

Against denominator:

Response ~ Subject

---

Bayes factor type: BFlinearModel, JZS

Then, Gibbs sampling can obtain parameter estimates from the posterior distribution of the Bayes factor object numerator. Those estimates are plugged into the interval equations for LH- or JZS-HDI in Table 1 to construct the within-subject HDI. This method can be implemented using the R code below, which defines a new anovaHDI function. Both anovaHDI and the default rmHDI functions assume all the same priors but use different sampling algorithms for establishing posterior distributions.

> anovaHDI <- function(data, whichSubject, whichLevel, whichResponse, cred, iter) {

#' input arguments are defined as in the rmHDI function

n <- length(unique(data[,whichSubject]))

a <- length(unique(data[,whichLevel]))

BF <- BayesFactor::anovaBF(as.formula(paste(whichResponse, "~", whichLevel, "+", whichSubject)), data = data, whichRandom = whichSubject, iterations = iter, progress = FALSE)

chains <- BayesFactor::posterior(BF, iterations = iter, progress = FALSE)

mu.chains <- chains[,2:(a+1)] + chains[,1]

widths <- qt((1 + cred) / 2, df = a * (n - 1)) * sqrt(chains[,"sig2"] / n)

uprs <- mu.chains + widths

lwrs <- mu.chains - widths

matrix(c(colMeans(lwrs),colMeans(uprs)), nrow = a, dimnames = list(paste("Level",1:a), c("lower","upper")))

}

> set.seed(277)

> anovaHDI(recall.long, "Subject", "Level", "Response", .95, 100000)

lower upper

Level 1 10.50752 11.64187

Level 2 12.41498 13.54934

Level 3 13.56208 14.69644

Appendix C

Monte Carlo error and a data permutation issue

Users should expect different results if they vary the number of iterations or the random seed used in MCMC. Such variability is referred to as Monte Carlo error. We examined Monte Carlo error in computing Bayes factors by applying the anovaBF function 500 times (each containing 100,000 MCMC iterations; the default value is 10,000) with different random seeds on the same set of simulated within-subject data. Among these 500 runs, one Bayes factor outlier was as extreme as 11.7, although the vast majority of values ranged from 4.3 to 5.4. R scripts for this and the following examples are available at https://osf.io/x2pvw/.

Similarly, Monte Carlo error is associated with the Bayesian interval estimation. We replicated rmHDI and anovaBF functions 100 times (each containing 20,000 MCMC draws – 2 chains with 10,000 iterations each in rmHDI) with different random seeds on the same simulated within-subject data. Furthermore, we visualized the resulting variability of the posterior mean estimates, HDI widths, and HDI separation via density plots. The whole process was repeated several times for data sets having different Bayes factors, correlations between conditions, and sample sizes. One example is exhibited in Fig. 11. In different realizations of the draws from the posterior distribution, it is also worthwhile to note a data permutation issue that affects the simulation. That is, the same experimental data are used but permuted by row (e.g., switch Subject 10 up to the second place) or by column (e.g., Level-high and Level-low rather than Level-low and Level-high order). Permutation of the entries in a data file will result in slightly different estimates even if the random seed stays the same. We randomly permuted the data by row but fixed the same random seed when calling rmHDI to assess the magnitude of the permutation issue relative to Monte Carlo error. In Fig. 11, two density plots (permuted data but using a constant random seed, or not permuted but varying the random seed) generated from results provided by the rmHDI function in the ‘rmBayes’ package highly overlap, indicating that permutation of the data produces variability in outcomes of a similar magnitude to setting different random seeds. Moreover, the functions in the ‘BayesFactor’ package returned less variability in estimates for posterior means but more variability in estimates for standard error of the mean (thus, interval width) and posterior mean difference, whereas the rmHDI performance is quite the opposite. The separation percentage is less variable when calling rmHDI, as shown in panel E of Fig. 11. Although the models and priors assumed by the two packages are identical, there may be differences in the actual code implementation, especially for Equations 2 and 4 and MCMC samplers (Gibbs sampling in the anovaBF and NUTS in the rmHDI), leading to differences in the variable results.

In the rmHDI function, the default setting for the argument permuted is TRUE, meaning the converted wide-format data are first ordered by their column names in alphabetic order. Then, the data are placed in ascending order by the first and second columns.

Appendix D

Warning messages regarding sampling and effective Monte Carlo sample size

The Stan website https://mc-stan.org/misc/warnings lists all the potential warnings in running an MCMC. Three common warnings are related to the exceeded maximum tree depth (a concern for long execution time), low bulk effective samples size (ESS, indicating posterior means and medians may be unreliable), and low tail ESS (indicating posterior variances and tail quantiles may be unreliable). The relevance of these warnings depends on the specific data being analyzed. Visit the website https://osf.io/x2pvw/ for an example.

We suspect that a high correlation between conditions in a within-subject design might result in slower, inefficient sampling due to a computed likelihood with elongated elliptical contours. The latter two warnings indicate that the sampler is moving slowly. After accounting for the correlation across successive draws of the Markov chain sampler, the ESS is low. For example, if the lag-1 autocorrelation of the MCMC sampling output is high (e.g., above .97), then 2,000 iterations can be worth, say fewer than 100 independent draws. The warning disappears with 10,000 iterations because the effective sample size may then be sufficiently high to cross the threshold in Stan (it might be an ESS of approximately 500).

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wei, Z., Nathoo, F.S. & Masson, M.E.J. Investigating the relationship between the Bayes factor and the separation of credible intervals. Psychon Bull Rev 30, 1759–1781 (2023). https://doi.org/10.3758/s13423-023-02295-1

Download citation

Accepted: 16 April 2023
Published: 11 May 2023
Issue Date: October 2023
DOI: https://doi.org/10.3758/s13423-023-02295-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Investigating the relationship between the Bayes factor and the separation of credible intervals

Abstract

Access this article

Similar content being viewed by others

A new criterion for assessing discriminant validity in variance-based structural equation modeling