Skip to main content
Log in

Investigating the relationship between the Bayes factor and the separation of credible intervals

  • Theoretical/Review
  • Published:
Psychonomic Bulletin & Review Aims and scope Submit manuscript

Abstract

We examined the relationship between the Bayes factor and the separation of credible intervals in between- and within-subject designs under a range of effect and sample sizes. For the within-subject case, we considered five intervals: (1) the within-subject confidence interval of Loftus and Masson (1994); (2) the within-subject Bayesian interval developed by Nathoo et al. (2018), whose derivation conditions on estimated random effects; (3) and (4) two modifications of (2) based on a proposal by Heck (2019) to allow for shrinkage and account for uncertainty in the estimation of random effects; and (5) the standard Bayesian highest-density interval. We derived and observed through simulations a clear and consistent relationship between the Bayes factor and the separation of credible intervals. Remarkably, for a given sample size, this relationship is described well by a simple quadratic exponential curve and is most precise in case (4). In contrast, interval (5) is relatively wide due to between-subjects variability and is likely to obscure effects when used in within-subject designs, rendering its relationship with the Bayes factor unclear in that case. We discuss how the separation percentage of (4), combined with knowledge of the sample size, could provide evidence in support of either a null or an alternative hypothesis. We also present a case study with example data and provide an R package ‘rmBayes’ to enable computation of each of the within-subject credible intervals investigated here using a number of possible prior distributions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data availability

The data used in the analyses are available via the Open Science Framework at https://osf.io/x2pvw/.

Code availability

All R code is available via the Open Science Framework at https://osf.io/x2pvw/.

Notes

  1. Other types of averages can also be used. The root mean square is preferable because it connects the pooled confidence interval width (\(l={t}_{1-\frac{\alpha }{2},\kern0.37em {n}_1+{n}_2-2}^{\ast}\cdot {s}_p\sqrt{\frac{1}{n_1}+\frac{1}{n_2}}\)) for the difference between means in a two-sample t-test to the confidence interval widths (\({l}_i={t}_{1-\frac{\alpha }{2},\kern0.37em {n}_1+{n}_2-2}^{\ast}\cdot {s}_p\sqrt{\frac{1}{n_i}}\)) for the population means in an unbalanced one-way ANOVA with two conditions. \({l}^2={l}_1^2+{l}_2^2\), and sp is the pooled estimate of the common standard deviation.

  2. Faulkenberry and Brennan (2022) extended Equation 9 to a closed-form expression of the Pearson Bayes factor for within-subject designs simply by substituting N = n(a − 1) for the total number of independent observations (Masson, 2011, p. 682).

  3. The Stan syntax target += -log(sigma); in place of target += -2*log(sigma); has been implemented in version 0.1.15 and later of the ‘rmBayesR package to accurately reflect the Jeffreys prior in Equation 2. Regardless of which syntax is used, there is little difference in graphical results, which can be seen as an example of a sensitivity analysis for different possible priors. ‘rmBayes’ 0.1.15 was used for the computations reported in this article.

  4. By calling rmHDI(recall.long, iter = 2e4, seed = 277)$width, macOS may return 0.5613043, Intel-based macOS may return 0.5601921, Compute Canada Cedar may return 0.5600443, and Windows may return 0.5589209.

  5. As a sufficient but not necessary condition for conducting repeated-measures ANOVA, the compound symmetry assumption states that all conditions have equal population variance, and all pairs of conditions have equal covariance. Hence, compound symmetry is a restrictive form of circularity. See remarks in Cousineau (2019, p. 232).

  6. At the time of writing, the ‘BayesFactorR package by Morey and Rouder (2022) implemented Equation 13 for multiway within-subject designs, but van den Bergh et al. (2022) have realized the misspecification and started to update the functions accordingly. See also Kruschke (2014, p. 606-608). Changes will not affect the one-way models used for simulations in this article.

References

  • Armitage, P., Berry, G., & Matthews, J. N. S. (2002). Statistical methods in medical research (4th ed.). Bodmin, UK: Blackwell Science. https://doi.org/10.1002/9780470773666

  • Bartlett, M. S. (1957). A comment on D. V. Lindley’s statistical paradox. Biometrika, 44, 533–534. https://doi.org/10.1093/biomet/44.3-4.533

  • Bub, D. N., Masson, M. E., & van Noordenne, M. (2021). Motor representations evoked by objects under varying action intentions. Journal of Experimental Psychology: Human Perception and Performance, 47, 53–80.

    PubMed  Google Scholar 

  • Campbell, H., & Gustafson, P. (2021). re: Linde et al. (2021) - The Bayes factor, HDI-ROPE and frequentist equivalence testing are actually all equivalent. ArXiv. 1–22. https://doi.org/10.48550/arXiv.2104.07834

  • Carvalho, C. M., Polson, N. G., & Scott, J. G. (2010). The horseshoe estimator for sparse signals. Biometrika, 97, 465–480.

    Article  Google Scholar 

  • Casella, G., Ghosh, M., Gill, J., & Kyung, M. (2010). Penalized regression, standard errors, and Bayesian lassos. Bayesian Analysis, 5, 369–411.

    Article  Google Scholar 

  • Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). New York: Routledge. https://doi.org/10.4324/9780203771587

  • Congdon, P. D. (2019). Bayesian hierarchical models with applications using R (2nd ed.). New York: Chapman and Hall/CRC. https://doi.org/10.1201/9780429113352

  • Cousineau, D. (2019). Correlation-adjusted standard errors and confidence intervals for within-subject designs: A simple multiplicative approach. The Quantitative Methods for Psychology, 15, 226–241.

    Article  Google Scholar 

  • Craiu, R. V., Gustafson, P., & Rosenthal, J. S. (2022). Reflections on Bayesian inference and Markov chain Monte Carlo. The Canadian Journal of Statistics, 50, 1213–1227.

    Article  Google Scholar 

  • Cumming, G. (2014). The new statistics: Why and how. Psychological Science, 25, 7–29.

    Article  PubMed  Google Scholar 

  • Dienes, Z. (2021). Obtaining evidence for no effect. Collabra. Psychology, 7, 1–15.

    Google Scholar 

  • Eich, E. (2014). Business not as usual. Psychological Science, 25, 3–6.

    Article  PubMed  Google Scholar 

  • Etz, A., & Vandekerckhove, J. (2016). A Bayesian perspective on the reproducibility project: Psychology. PLoS ONE, 11, 1–12.

    Article  Google Scholar 

  • Evett, I. W. (1987). Bayesian inference and forensic science: Problems and perspectives. Journal of the Royal Statistical Society, 36, 99–105.

    Google Scholar 

  • Faul, F., Erdfelder, E., Lang, A.-G., & Buchner, A. (2007). G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39, 175–191.

    Article  PubMed  Google Scholar 

  • Faulkenberry, T. J. (2021). The Pearson Bayes factor: An analytic formula for computing evidential value from minimal summary statistics. Biometrical Letters, 58, 1–26.

    Article  Google Scholar 

  • Faulkenberry, T. J., & Brennan, K. B. (2022). Computing analytic Bayes factors from summary statistics in repeated-measures designs. ArXiv., 1–25. https://doi.org/10.48550/arXiv.2209.08159

  • Franz, V. H., & Loftus, G. R. (2012). Standard errors and confidence intervals in within-subjects designs: Generalizing Loftus and Masson (1994) and avoiding the biases of alternative accounts. Psychonomic Bulletin & Review, 19, 395–404.

    Article  Google Scholar 

  • Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A. & Rubin, D. B. (2013). Bayesian data analysis (3rd ed.). New York: Chapman and Hall/CRC. https://doi.org/10.1201/b16018

  • Greenhouse, S. W., & Geisser, S. (1959). On methods in the analysis of profile data. Psychometrika, 24, 95–112.

    Article  Google Scholar 

  • Heck, D. W. (2019). Accounting for estimation uncertainty and shrinkage in Bayesian within-subject intervals: A comment on Nathoo, Kilshaw, and Masson (2018). Journal of Mathematical Psychology, 88, 27–31.

    Article  Google Scholar 

  • Hoekstra, R., Morey, R. D., Rouder, J. N., & Wagenmakers, E.-J. (2014). Robust misinterpretation of confidence intervals. Psychonomic Bulletin & Review, 21, 1157–1164.

    Article  Google Scholar 

  • Hu, C., Wang, F., Guo, J., Song, M., Sui, J., & Peng, K. (2016). The replication crisis in psychological research. Advances in Psychological Science, 24, 1504–1518.

    Article  Google Scholar 

  • Huynh, H., & Feldt, L. S. (1976). Estimation of the Box correction for degrees of freedom from sample data in randomised block and split-plot designs. Journal of Educational Statistics, 1, 69–82.

    Article  Google Scholar 

  • Jaynes, E. T., & Kempthorne, O. (1976). Confidence intervals vs Bayesian intervals. Foundations of Probability Theory, Statistical Inference, and Statistical Theories of Science, 6b, 175–257. https://doi.org/10.1007/978-94-010-1436-6_6

  • Jeffreys, H. (1935). Some tests of significance, treated by the theory of probability. Mathematical Proceedings of the Cambridge Philosophical Society, 31, 203–222.

    Article  Google Scholar 

  • Jeffreys, H. (1936). Further significance tests. Mathematical Proceedings of the Cambridge Philosophical Society, 32, 416–445.

    Article  Google Scholar 

  • Jeffreys, H. (1946). An invariant form for the prior probability in estimation problems. Proceedings of the Royal Society of London. Series A, Mathematical and Physical Sciences, 186, 453–461.

    PubMed  Google Scholar 

  • Jeffreys, H. (1961). Theory of probability (3rd ed.). Oxford, UK: Oxford University Press. https://global.oup.com/academic/product/theory-of-probability-9780198503682

  • Jusczyk, P. W., Houston, D. M., & Newsome, M. (1999). The beginnings of word segmentation in English-learning infants. Cognitive Psychology, 39, 159–207.

    Article  PubMed  Google Scholar 

  • Kass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association, 90, 773–795.

    Article  Google Scholar 

  • Kline, R. B. (2013). Beyond significance testing: Statistics reform in the behavioral sciences (2nd ed.). Washington, D.C.: American Psychological Association. https://doi.org/10.1037/14136-000

  • Kotz, S., & Nadarajah, S. (2004). Multivariate t-distributions and their applications. Cambridge University Press. https://doi.org/10.1017/CBO9780511550683

    Book  Google Scholar 

  • Kruschke, J. K. (2014). Doing Bayesian data analysis: A tutorial with R, JAGS, and Stan (2nd ed.). London, UK: Academic Press. https://doi.org/10.1016/B978-0-12-405888-0.09999-2

  • Kruschke, J. K. (2018). Rejecting or accepting parameter values in Bayesian estimation. Advances in Methods and Practices in Psychological Science, 1, 270–280.

    Article  Google Scholar 

  • Kruschke, J. K. (2021). Bayesian analysis reporting guidelines. Nature Human Behaviour, 5, 1282–1291.

    Article  PubMed  PubMed Central  Google Scholar 

  • Lawrence, M. A. (2016). ez: Easy analysis and visualization of factorial experiments. R package version 4.4-0. https://cran.r-project.org/package=ez

  • Lee, M. D., & Wagenmakers, E.-J. (2014). Bayesian cognitive modeling: A practical course. Cambridge University Press. https://doi.org/10.1017/CBO9781139087759

    Book  Google Scholar 

  • Liang, F., Paulo, R., Molina, G., Clyde, M. A., & Berger, J. O. (2008). Mixtures of g priors for Bayesian variable selection. Journal of the American Statistical Association, 103, 410–423.

    Article  Google Scholar 

  • Linde, M., Tendeiro, J., Selker, R., Wagenmakers, E.-J., & van Ravenzwaaij, D. (2021). Decisions about equivalence: A comparison of TOST, HDI-ROPE, and the Bayes factor. Psychological Methods. 1–16. https://doi.org/10.1037/met0000402

  • Lindley, D. V. (1957). A statistical paradox. Biometrika, 44, 187–192. https://doi.org/10.2307/2333251

    Article  Google Scholar 

  • Loftus, G. R., & Masson, M. E. J. (1994). Using confidence intervals in within-subject designs. Psychonomic Bulletin & Review, 1, 476–490.

    Article  Google Scholar 

  • Lovric, M. M. (2020). Conflicts in Bayesian statistics between inference based on credible intervals and Bayes factors. Journal of Modern Applied Statistical Methods, 18, 1–27.

    Article  Google Scholar 

  • Ly, A., Boehm, U., Heathcote, A., Turner, B. M., Forstmann, B., Marsman, M., & Matzke, D. (2017). A flexible and efficient hierarchical Bayesian approach to the exploration of individual differences in cognitive-model-based neuroscience. Computational Models of Brain and Behavior, 467–479. https://doi.org/10.1002/9781119159193.ch34

  • Ly, A., Raj, A., Etz, A., Marsman, M., Gronau, Q. F., & Wagenmakers, E.-J. (2018). Bayesian reanalyses from summary statistics: a guide for academic consumers. Advances in Methods and Practices in Psychological Science, 1, 367–374.

    Article  Google Scholar 

  • Ly, A., Verhagen, J., & Wagenmakers, E.-J. (2016). Harold Jeffreys’s default Bayes factor hypothesis tests: Explanation, extension, and application in psychology. Journal of Mathematical Psychology, 72, 19–32.

    Article  Google Scholar 

  • Maruyama, Y., & George, E. I. (2011). Fully Bayes factors with a generalized g-prior. The Annals of Statistics, 39, 2740–2765.

    Article  Google Scholar 

  • Masson, M. E. J. (2011). A tutorial on a practical Bayesian alternative to null-hypothesis significance testing. Behavior Research Methods, 43, 679–690.

    Article  PubMed  Google Scholar 

  • Masson, M. E. J., & Loftus, G. R. (2003). Using confidence intervals for graphically based data interpretation. Canadian Journal of Experimental Psychology, 57, 203–220.

    Article  PubMed  Google Scholar 

  • Morey, R. D. (2015a). Multiple comparisons with BayesFactor, Part 1. R-Bloggers. https://www.r-bloggers.com/2015/01/multiple-comparisons-with-bayesfactor-part-1/

  • Morey, R. D. (2015b). Multiple comparisons with BayesFactor, Part 2 - Order restrictions. BayesFactor. https://bayesfactor.blogspot.com/2015/01/multiple-comparisons-with-bayesfactor-2.html

  • Morey, R. D., Romeijn, J. W., & Rouder, J. N. (2016). The philosophy of Bayes factors and the quantification of statistical evidence. Journal of Mathematical Psychology, 72, 6–18.

    Article  Google Scholar 

  • Morey, R. D., & Rouder, J. N. (2022). BayesFactor: Computation of Bayes factors for common designs. R package version 0.9.12-4.4. https://cran.r-project.org/package=BayesFactor

  • Morey, R. D., Rouder, J. N., Pratte, M. S., & Speckman, P. L. (2011). Using MCMC chain outputs to efficiently estimate Bayes factors. Journal of Mathematical Psychology, 55, 368–378.

    Article  Google Scholar 

  • Nathoo, F. S., Kilshaw, R. E., & Masson, M. E. J. (2018). A better (Bayesian) interval estimate for within-subject designs. Journal of Mathematical Psychology, 86, 1–9.

    Article  Google Scholar 

  • Nathoo, F. S., & Masson, M. E. J. (2016). Bayesian alternatives to null-hypothesis significance testing for repeated-measures designs. Journal of Mathematical Psychology, 72, 144–157.

    Article  Google Scholar 

  • Raftery, A. E. (1995). Bayesian model selection in social research. Sociological Methodology, 25, 111–163.

    Article  Google Scholar 

  • Rouder, J. N., Morey, R. D., Speckman, P. L., & Province, J. M. (2012). Default Bayes factors for ANOVA designs. Journal of Mathematical Psychology, 56, 356–374.

    Article  Google Scholar 

  • Rouder, J. N., Morey, R. D., Verhagen, J., Swagman, A. R., & Wagenmakers, E.-J. (2017). Bayesian analysis of factorial designs. Psychological Methods, 22, 304–321.

    Article  PubMed  Google Scholar 

  • Rouder, J. N., Speckman, P. L., Sun, D., Morey, R. D., & Iverson, G. (2009). Bayesian t tests for accepting and rejecting the null hypothesis. Psychonomic Bulletin & Review, 16, 225–237.

    Article  Google Scholar 

  • Schenker, N., & Gentleman, J. F. (2001). On judging the significance of differences by examining the overlap between confidence intervals. The American Statistician, 55, 182–186.

    Article  Google Scholar 

  • Stan Development Team (2023). RStan: The R interface to Stan. R package version 2.21.8. https://mc-stan.org/

  • Urry, H. L., van Reekum, C. M., Johnstone, T., Kalin, N. H., Thurow, M. E., Schaefer, H. S., Jackson, C. A., Frye, C. J., Greischar, L. L., Alexander, A. L., & Davidson, R. J. (2006). Amygdala and ventromedial prefrontal cortex are inversely coupled during regulation of negative affect and predict the diurnal pattern of cortisol secretion among older adults. Journal of Neuroscience, 26, 4415–4425.

    Article  PubMed  Google Scholar 

  • van den Bergh, D., Wagenmakers, E.-J., & Aust, F. (2022). Bayesian repeated-measures ANOVA: An updated methodology implemented in JASP. PsyArXiv. 1-28. 10.31234/osf.io/fb8zn

  • Vogel, E. K., Woodman, G. F., & Luck, S. J. (2001). Storage of features, conjunctions, and objects in visual working memory. Journal of Experimental Psychology: Human Perception and Performance, 27, 92–114.

    PubMed  Google Scholar 

  • Wagenmakers, E.-J. (2007). A practical solution to the pervasive problems of p values. Psychonomic Bulletin & Review, 14, 779–804.

    Article  Google Scholar 

  • Wagenmakers, E.-J. (2022). Approximate objective Bayes factors from p-values and sample size: The \(3p\sqrt{n}\) rule. PsyArXiv. 1-50. https://doi.org/10.31234/osf.io/egydq

  • Wagenmakers, E.-J., Gronau, Q. F., Dablander, F., & Etz, A. (2022). The support interval. Erkenntnis, 87, 589–601.

    Article  Google Scholar 

  • Wagenmakers, E.-J., Lodewyckx, T., Kuriyal, H., & Grasman, R. (2010). Bayesian hypothesis testing for psychologists: A tutorial on the Savage-Dickey method. Cognitive Psychology, 60, 158–189.

    Article  PubMed  Google Scholar 

  • Wagenmakers, E.-J., & Ly, A. (2023). History and nature of the Jeffreys-Lindley paradox. Archive for History of Exact Sciences, 77, 25–72.

    Article  Google Scholar 

  • Wang, M., & Liu, G. (2016). A simple two-sample Bayesian t-test for hypothesis testing. The American Statistician, 70, 195–201.

    Article  Google Scholar 

  • Wang, M., & Sun, X. (2014). Bayes factor consistency for one-way random effects model. Communications in Statistics - Theory and Methods, 43, 5072–5090.

    Article  Google Scholar 

  • Wei, Z., Nathoo, F. S., & Masson, M. E. J. (2022a). rmBayes: Performing Bayesian inference for repeated-measures designs. R package version 0.1.15. https://cran.r-project.org/package=rmBayes

  • Wei, Z., Yang, A., Rocha, L., Miranda, M. F., & Nathoo, F. S. (2022b). A review of Bayesian hypothesis testing and its practical implementations. Entropy, 24, 1–15.

    Article  Google Scholar 

  • Wetzels, R., Matzke, D., Lee, M. D., Rouder, J. N., Iverson, G. J., & Wagenmakers, E.-J. (2011). Statistical evidence in experimental psychology: An empirical comparison using 855 t tests. Perspectives on Psychological Science, 6, 291–298.

    Article  PubMed  Google Scholar 

  • Zellner, A., & Siow, A. (1980). Posterior odds ratios for selected regression hypotheses. Trabajos de Estadística Y de Investigación Operativa, 31, 585–603.

    Article  Google Scholar 

Download references

Acknowledgements

We thank Eric-Jan Wagenmakers for bringing to our attention the potential link between the separation of credible intervals and the Jeffreys-Lindley paradox. We are also grateful for an anonymous referee’s helpful comments on the likelihood principle, the importance of sample size, and evidence for the null hypothesis.

Funding

This work was supported by discovery grants to Farouk S. Nathoo (RGPIN-04044-2020) and Michael E. J. Masson (RGPIN-2015-04773) from the Natural Sciences and Engineering Research Council. Farouk S. Nathoo holds a Tier II Canada Research Chair in Biostatistics for Spatial and High-Dimensional Data.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhengxiao Wei.

Ethics declarations

Conflict of interest

The authors declare no conflicts of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A

Proof of Theorem 1

After some substitutions, the Pearson Bayes factor in Equation 9 becomes

$$\text{P-BF}_{10}=\frac{\Gamma \left(\frac{a+1}{2}+\upgamma \right)\cdot \Gamma \left(\frac{a\left(n-1\right)}{2}\right)}{\Gamma \left(\frac{an-1}{2}\right)\cdot \Gamma \left(1+\upgamma \right)}{\left(1+\frac{{{SS}}_{\mathrm{B}}}{{{SS}}_{\mathrm{W}}}\ \right)}^{\frac{a\left(n-1\right)}{2}-1-\upgamma}.$$
(A1)

By applying Stirling’s formula Γ(y + z) ∼ yzΓ(y) as y →  + ∞, the gamma ratio in Equation A1 becomes \(\frac{\Gamma \left(\frac{a\left(n-1\right)}{2}\right)}{\Gamma \left(\frac{an-1}{2}\right)}\sim {\left(\frac{an}{2}\right)}^{\frac{1-a}{2}}\) as n →  + ∞.

We calculated the separation for the standard between-subjects confidence interval in Equation A2 as the absolute value of the difference between two sample means over the twofold interval width. Here, we consider only a = 2.

$$Sep=\left|{M}_{1\cdot }-{M}_{2\cdot}\right|/\left(2{t}_{1-\frac{\alpha }{2},\ a\left(n-1\right)}^{\ast}\sqrt{\frac{{{SS}}_{\mathrm{W}}}{n\left(n-1\right)a}}\right).$$
(A2)

As n →  + ∞, \({t}_{1-\frac{\alpha }{2},\ a\left(n-1\right)}^{\ast}\sim {z}_{1-\frac{\alpha }{2}}\), the sample means converge to the population means, and (SSB/SSW)2 is assumed to be the least significant in the exponential series. \({SS}_{\mathrm{B}}=n\overset{a}{\sum\limits_{i=1}}{\left({M}_{i\cdot }-M\right)}^2\) reduces to \(\frac{1}{2}n{\left({M}_{1\cdot }-{M}_{2\cdot}\right)}^2\)when a = 2. Plugging Equation A2 in the limit below, we obtain

$${\displaystyle \begin{array}{ll} & \underset{n\to +\infty }{\lim}\left[{\left(1+\frac{SS_{\mathrm{B}}}{SS_{\mathrm{W}}}\right)}^{\frac{a\left(n-1\right)}{2}-1-\upgamma}-\exp \left\{{z}_{1-\frac{\upalpha}{2}}^2\cdot {Sep}^2\right\}\right]\\ {}=& \underset{n\to +\infty }{\lim}\left[{\left(1+\frac{SS_{\mathrm{B}}}{SS_{\mathrm{W}}}\right)}^{\frac{a\left(n-1\right)}{2}-1-\upgamma}-\exp \left\{\frac{1}{2}a\left(n-1\right)\frac{SS_{\mathrm{B}}}{SS_{\mathrm{W}}}\right\}\right]\\ {}=& \underset{n\to +\infty }{\lim}\left[{\left(1+\frac{SS_{\mathrm{B}}}{SS_{\mathrm{W}}}\right)}^{\frac{a\left(n-1\right)}{2}-1-\upgamma}-{\left(1+\frac{SS_{\mathrm{B}}}{SS_{\mathrm{W}}}+O\left({\left(\frac{SS_{\mathrm{B}}}{SS_{\mathrm{W}}}\right)}^2\right)\right)}^{\frac{a\left(n-1\right)}{2}}\right]=0.\end{array}}$$

Hence, the asymptotic approximation of the log Pearson Bayes factor is a quadratic function of the separation of the standard confidence interval for population means in balanced one-way between-subjects designs, as the number of subjects goes to infinity. To check the convergence of the limit, we plot the (solid black) fitted line for the relationship between the log JZS-BF10 and squared separation score, along with the (dashed red) analytic line from plugging the known values into the formula of Theorem 1, in Fig. 10. As the sample size increases, the two lines become closer. We still expect some variations between these lines even for very large n because Theorem 1 applies for the Pearson Bayes factor, whereas the quadratic exponential is interpolated for the JZS Bayes factor.

Fig. 10
figure 10

Plots of the relationship between the log Bayes factor and the squared separation score of the standard confidence interval for population means in between-subjects designs, with the fitted line in solid black and the analytic line in dashed red

Appendix B

R packages to perform Bayesian inference

The ‘rmBayes’ package performs Bayesian interval estimation for both the homoscedastic and heteroscedastic cases in either between- or within-subject designs that include a single independent variable. The Stan-based R source package installation will take a few minutes because models need to be compiled into dynamic shared objects. We recommend using R version 4.0.1 or later and installing the pre-compiled binary package so users do not have to worry about C++ compiler issues. The relevant commands are:

> install.packages("rmBayes", type = "binary")

> library(rmBayes)

The rmHDI function in ‘rmBayes’ provides multiple methods to construct the credible intervals for population means, with each method based on different sets of priors. The default method implements the NUTS algorithm and constructs the within-subject HDI corresponding to the JZS-HDI case in Table 1. More methods documentation can be viewed on GitHub, https://zhengxiaouvic.github.io/rmBayes/. The following example includes a partial data set, a call to the rmHDI function, and the resulting output. The partial data set is also shown in wide format.

> ## Data are in the long format. 10 subjects. 3 conditions.

> head(recall.long, 2)

Subject Level Response

1 s1 Level1 10

2 s2 Level1 6

> rmHDI(recall.long, whichSubject = "Subject", whichLevel = "Level", whichResponse = "Response", seed = 277) #macOS (Apple chip)

$HDI

lower upper

Level1 10.47101 11.59361

Level2 12.39176 13.51436

Level3 13.55086 14.67346

$`posterior means`

Level1 Level2 Level3

11.03231 12.95306 14.11216

$width

[1] 0.5613014

> ## Same data are in the wide format.

> head(recall.wide, 2)

Level1 Level2 Level3

s1 10 13 13

s2 6 8 8

> rmHDI(data.wide= recall.wide, seed = 277)

An alternative method for computing HDIs is possible using the ‘BayesFactor’ package, which computes Bayes factors for several experimental designs. The anovaBF function can first be used to generate the Bayes factor for a within-subject design.

> library(BayesFactor); set.seed(277)

> anovaBF(Response ~ Level + Subject, data = recall.long, whichRandom = "Subject", iterations = 100000, progress = FALSE)

Bayes factor analysis

--------------

[1] Level + Subject : 36469.12 ±0.32%

Against denominator:

Response ~ Subject

---

Bayes factor type: BFlinearModel, JZS

Then, Gibbs sampling can obtain parameter estimates from the posterior distribution of the Bayes factor object numerator. Those estimates are plugged into the interval equations for LH- or JZS-HDI in Table 1 to construct the within-subject HDI. This method can be implemented using the R code below, which defines a new anovaHDI function. Both anovaHDI and the default rmHDI functions assume all the same priors but use different sampling algorithms for establishing posterior distributions.

> anovaHDI <- function(data, whichSubject, whichLevel, whichResponse, cred, iter) {

#' input arguments are defined as in the rmHDI function

n <- length(unique(data[,whichSubject]))

a <- length(unique(data[,whichLevel]))

BF <- BayesFactor::anovaBF(as.formula(paste(whichResponse, "~", whichLevel, "+", whichSubject)), data = data, whichRandom = whichSubject, iterations = iter, progress = FALSE)

chains <- BayesFactor::posterior(BF, iterations = iter, progress = FALSE)

mu.chains <- chains[,2:(a+1)] + chains[,1]

widths <- qt((1 + cred) / 2, df = a * (n - 1)) * sqrt(chains[,"sig2"] / n)

uprs <- mu.chains + widths

lwrs <- mu.chains - widths

matrix(c(colMeans(lwrs),colMeans(uprs)), nrow = a, dimnames = list(paste("Level",1:a), c("lower","upper")))

}

> set.seed(277)

> anovaHDI(recall.long, "Subject", "Level", "Response", .95, 100000)

lower upper

Level 1 10.50752 11.64187

Level 2 12.41498 13.54934

Level 3 13.56208 14.69644

Appendix C

Monte Carlo error and a data permutation issue

Users should expect different results if they vary the number of iterations or the random seed used in MCMC. Such variability is referred to as Monte Carlo error. We examined Monte Carlo error in computing Bayes factors by applying the anovaBF function 500 times (each containing 100,000 MCMC iterations; the default value is 10,000) with different random seeds on the same set of simulated within-subject data. Among these 500 runs, one Bayes factor outlier was as extreme as 11.7, although the vast majority of values ranged from 4.3 to 5.4. R scripts for this and the following examples are available at https://osf.io/x2pvw/.

Similarly, Monte Carlo error is associated with the Bayesian interval estimation. We replicated rmHDI and anovaBF functions 100 times (each containing 20,000 MCMC draws – 2 chains with 10,000 iterations each in rmHDI) with different random seeds on the same simulated within-subject data. Furthermore, we visualized the resulting variability of the posterior mean estimates, HDI widths, and HDI separation via density plots. The whole process was repeated several times for data sets having different Bayes factors, correlations between conditions, and sample sizes. One example is exhibited in Fig. 11. In different realizations of the draws from the posterior distribution, it is also worthwhile to note a data permutation issue that affects the simulation. That is, the same experimental data are used but permuted by row (e.g., switch Subject 10 up to the second place) or by column (e.g., Level-high and Level-low rather than Level-low and Level-high order). Permutation of the entries in a data file will result in slightly different estimates even if the random seed stays the same. We randomly permuted the data by row but fixed the same random seed when calling rmHDI to assess the magnitude of the permutation issue relative to Monte Carlo error. In Fig. 11, two density plots (permuted data but using a constant random seed, or not permuted but varying the random seed) generated from results provided by the rmHDI function in the ‘rmBayes’ package highly overlap, indicating that permutation of the data produces variability in outcomes of a similar magnitude to setting different random seeds. Moreover, the functions in the ‘BayesFactor’ package returned less variability in estimates for posterior means but more variability in estimates for standard error of the mean (thus, interval width) and posterior mean difference, whereas the rmHDI performance is quite the opposite. The separation percentage is less variable when calling rmHDI, as shown in panel E of Fig. 11. Although the models and priors assumed by the two packages are identical, there may be differences in the actual code implementation, especially for Equations 2 and 4 and MCMC samplers (Gibbs sampling in the anovaBF and NUTS in the rmHDI), leading to differences in the variable results.

Fig. 11
figure 11

Density plots of the simulations from replicating R functions. A: posterior mean difference; B: HDI width; C: posterior mean of one condition; D: posterior mean of the other condition; E: HDI separation percentage. The same random seed was fixed when investigating the permutation issue, denoted as ‘rmBayes (permuted)’, whereas different random seeds were used for investigating the Monte Carlo error in ‘BayesFactor’ and ‘rmBayes

In the rmHDI function, the default setting for the argument permuted is TRUE, meaning the converted wide-format data are first ordered by their column names in alphabetic order. Then, the data are placed in ascending order by the first and second columns.

Appendix D

Warning messages regarding sampling and effective Monte Carlo sample size

The Stan website https://mc-stan.org/misc/warnings lists all the potential warnings in running an MCMC. Three common warnings are related to the exceeded maximum tree depth (a concern for long execution time), low bulk effective samples size (ESS, indicating posterior means and medians may be unreliable), and low tail ESS (indicating posterior variances and tail quantiles may be unreliable). The relevance of these warnings depends on the specific data being analyzed. Visit the website https://osf.io/x2pvw/ for an example.

We suspect that a high correlation between conditions in a within-subject design might result in slower, inefficient sampling due to a computed likelihood with elongated elliptical contours. The latter two warnings indicate that the sampler is moving slowly. After accounting for the correlation across successive draws of the Markov chain sampler, the ESS is low. For example, if the lag-1 autocorrelation of the MCMC sampling output is high (e.g., above .97), then 2,000 iterations can be worth, say fewer than 100 independent draws. The warning disappears with 10,000 iterations because the effective sample size may then be sufficiently high to cross the threshold in Stan (it might be an ESS of approximately 500).

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wei, Z., Nathoo, F.S. & Masson, M.E.J. Investigating the relationship between the Bayes factor and the separation of credible intervals. Psychon Bull Rev 30, 1759–1781 (2023). https://doi.org/10.3758/s13423-023-02295-1

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.3758/s13423-023-02295-1

Keywords

Navigation