Abstract
We examined the relationship between the Bayes factor and the separation of credible intervals in between- and within-subject designs under a range of effect and sample sizes. For the within-subject case, we considered five intervals: (1) the within-subject confidence interval of Loftus and Masson (1994); (2) the within-subject Bayesian interval developed by Nathoo et al. (2018), whose derivation conditions on estimated random effects; (3) and (4) two modifications of (2) based on a proposal by Heck (2019) to allow for shrinkage and account for uncertainty in the estimation of random effects; and (5) the standard Bayesian highest-density interval. We derived and observed through simulations a clear and consistent relationship between the Bayes factor and the separation of credible intervals. Remarkably, for a given sample size, this relationship is described well by a simple quadratic exponential curve and is most precise in case (4). In contrast, interval (5) is relatively wide due to between-subjects variability and is likely to obscure effects when used in within-subject designs, rendering its relationship with the Bayes factor unclear in that case. We discuss how the separation percentage of (4), combined with knowledge of the sample size, could provide evidence in support of either a null or an alternative hypothesis. We also present a case study with example data and provide an R package ‘rmBayes’ to enable computation of each of the within-subject credible intervals investigated here using a number of possible prior distributions.
Similar content being viewed by others
Data availability
The data used in the analyses are available via the Open Science Framework at https://osf.io/x2pvw/.
Code availability
All R code is available via the Open Science Framework at https://osf.io/x2pvw/.
Notes
Other types of averages can also be used. The root mean square is preferable because it connects the pooled confidence interval width (\(l={t}_{1-\frac{\alpha }{2},\kern0.37em {n}_1+{n}_2-2}^{\ast}\cdot {s}_p\sqrt{\frac{1}{n_1}+\frac{1}{n_2}}\)) for the difference between means in a two-sample t-test to the confidence interval widths (\({l}_i={t}_{1-\frac{\alpha }{2},\kern0.37em {n}_1+{n}_2-2}^{\ast}\cdot {s}_p\sqrt{\frac{1}{n_i}}\)) for the population means in an unbalanced one-way ANOVA with two conditions. \({l}^2={l}_1^2+{l}_2^2\), and sp is the pooled estimate of the common standard deviation.
The Stan syntax target += -log(sigma); in place of target += -2*log(sigma); has been implemented in version 0.1.15 and later of the ‘rmBayes’ R package to accurately reflect the Jeffreys prior in Equation 2. Regardless of which syntax is used, there is little difference in graphical results, which can be seen as an example of a sensitivity analysis for different possible priors. ‘rmBayes’ 0.1.15 was used for the computations reported in this article.
By calling rmHDI(recall.long, iter = 2e4, seed = 277)$width, macOS may return 0.5613043, Intel-based macOS may return 0.5601921, Compute Canada Cedar may return 0.5600443, and Windows may return 0.5589209.
As a sufficient but not necessary condition for conducting repeated-measures ANOVA, the compound symmetry assumption states that all conditions have equal population variance, and all pairs of conditions have equal covariance. Hence, compound symmetry is a restrictive form of circularity. See remarks in Cousineau (2019, p. 232).
At the time of writing, the ‘BayesFactor’ R package by Morey and Rouder (2022) implemented Equation 13 for multiway within-subject designs, but van den Bergh et al. (2022) have realized the misspecification and started to update the functions accordingly. See also Kruschke (2014, p. 606-608). Changes will not affect the one-way models used for simulations in this article.
References
Armitage, P., Berry, G., & Matthews, J. N. S. (2002). Statistical methods in medical research (4th ed.). Bodmin, UK: Blackwell Science. https://doi.org/10.1002/9780470773666
Bartlett, M. S. (1957). A comment on D. V. Lindley’s statistical paradox. Biometrika, 44, 533–534. https://doi.org/10.1093/biomet/44.3-4.533
Bub, D. N., Masson, M. E., & van Noordenne, M. (2021). Motor representations evoked by objects under varying action intentions. Journal of Experimental Psychology: Human Perception and Performance, 47, 53–80.
Campbell, H., & Gustafson, P. (2021). re: Linde et al. (2021) - The Bayes factor, HDI-ROPE and frequentist equivalence testing are actually all equivalent. ArXiv. 1–22. https://doi.org/10.48550/arXiv.2104.07834
Carvalho, C. M., Polson, N. G., & Scott, J. G. (2010). The horseshoe estimator for sparse signals. Biometrika, 97, 465–480.
Casella, G., Ghosh, M., Gill, J., & Kyung, M. (2010). Penalized regression, standard errors, and Bayesian lassos. Bayesian Analysis, 5, 369–411.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). New York: Routledge. https://doi.org/10.4324/9780203771587
Congdon, P. D. (2019). Bayesian hierarchical models with applications using R (2nd ed.). New York: Chapman and Hall/CRC. https://doi.org/10.1201/9780429113352
Cousineau, D. (2019). Correlation-adjusted standard errors and confidence intervals for within-subject designs: A simple multiplicative approach. The Quantitative Methods for Psychology, 15, 226–241.
Craiu, R. V., Gustafson, P., & Rosenthal, J. S. (2022). Reflections on Bayesian inference and Markov chain Monte Carlo. The Canadian Journal of Statistics, 50, 1213–1227.
Cumming, G. (2014). The new statistics: Why and how. Psychological Science, 25, 7–29.
Dienes, Z. (2021). Obtaining evidence for no effect. Collabra. Psychology, 7, 1–15.
Eich, E. (2014). Business not as usual. Psychological Science, 25, 3–6.
Etz, A., & Vandekerckhove, J. (2016). A Bayesian perspective on the reproducibility project: Psychology. PLoS ONE, 11, 1–12.
Evett, I. W. (1987). Bayesian inference and forensic science: Problems and perspectives. Journal of the Royal Statistical Society, 36, 99–105.
Faul, F., Erdfelder, E., Lang, A.-G., & Buchner, A. (2007). G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39, 175–191.
Faulkenberry, T. J. (2021). The Pearson Bayes factor: An analytic formula for computing evidential value from minimal summary statistics. Biometrical Letters, 58, 1–26.
Faulkenberry, T. J., & Brennan, K. B. (2022). Computing analytic Bayes factors from summary statistics in repeated-measures designs. ArXiv., 1–25. https://doi.org/10.48550/arXiv.2209.08159
Franz, V. H., & Loftus, G. R. (2012). Standard errors and confidence intervals in within-subjects designs: Generalizing Loftus and Masson (1994) and avoiding the biases of alternative accounts. Psychonomic Bulletin & Review, 19, 395–404.
Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A. & Rubin, D. B. (2013). Bayesian data analysis (3rd ed.). New York: Chapman and Hall/CRC. https://doi.org/10.1201/b16018
Greenhouse, S. W., & Geisser, S. (1959). On methods in the analysis of profile data. Psychometrika, 24, 95–112.
Heck, D. W. (2019). Accounting for estimation uncertainty and shrinkage in Bayesian within-subject intervals: A comment on Nathoo, Kilshaw, and Masson (2018). Journal of Mathematical Psychology, 88, 27–31.
Hoekstra, R., Morey, R. D., Rouder, J. N., & Wagenmakers, E.-J. (2014). Robust misinterpretation of confidence intervals. Psychonomic Bulletin & Review, 21, 1157–1164.
Hu, C., Wang, F., Guo, J., Song, M., Sui, J., & Peng, K. (2016). The replication crisis in psychological research. Advances in Psychological Science, 24, 1504–1518.
Huynh, H., & Feldt, L. S. (1976). Estimation of the Box correction for degrees of freedom from sample data in randomised block and split-plot designs. Journal of Educational Statistics, 1, 69–82.
Jaynes, E. T., & Kempthorne, O. (1976). Confidence intervals vs Bayesian intervals. Foundations of Probability Theory, Statistical Inference, and Statistical Theories of Science, 6b, 175–257. https://doi.org/10.1007/978-94-010-1436-6_6
Jeffreys, H. (1935). Some tests of significance, treated by the theory of probability. Mathematical Proceedings of the Cambridge Philosophical Society, 31, 203–222.
Jeffreys, H. (1936). Further significance tests. Mathematical Proceedings of the Cambridge Philosophical Society, 32, 416–445.
Jeffreys, H. (1946). An invariant form for the prior probability in estimation problems. Proceedings of the Royal Society of London. Series A, Mathematical and Physical Sciences, 186, 453–461.
Jeffreys, H. (1961). Theory of probability (3rd ed.). Oxford, UK: Oxford University Press. https://global.oup.com/academic/product/theory-of-probability-9780198503682
Jusczyk, P. W., Houston, D. M., & Newsome, M. (1999). The beginnings of word segmentation in English-learning infants. Cognitive Psychology, 39, 159–207.
Kass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association, 90, 773–795.
Kline, R. B. (2013). Beyond significance testing: Statistics reform in the behavioral sciences (2nd ed.). Washington, D.C.: American Psychological Association. https://doi.org/10.1037/14136-000
Kotz, S., & Nadarajah, S. (2004). Multivariate t-distributions and their applications. Cambridge University Press. https://doi.org/10.1017/CBO9780511550683
Kruschke, J. K. (2014). Doing Bayesian data analysis: A tutorial with R, JAGS, and Stan (2nd ed.). London, UK: Academic Press. https://doi.org/10.1016/B978-0-12-405888-0.09999-2
Kruschke, J. K. (2018). Rejecting or accepting parameter values in Bayesian estimation. Advances in Methods and Practices in Psychological Science, 1, 270–280.
Kruschke, J. K. (2021). Bayesian analysis reporting guidelines. Nature Human Behaviour, 5, 1282–1291.
Lawrence, M. A. (2016). ez: Easy analysis and visualization of factorial experiments. R package version 4.4-0. https://cran.r-project.org/package=ez
Lee, M. D., & Wagenmakers, E.-J. (2014). Bayesian cognitive modeling: A practical course. Cambridge University Press. https://doi.org/10.1017/CBO9781139087759
Liang, F., Paulo, R., Molina, G., Clyde, M. A., & Berger, J. O. (2008). Mixtures of g priors for Bayesian variable selection. Journal of the American Statistical Association, 103, 410–423.
Linde, M., Tendeiro, J., Selker, R., Wagenmakers, E.-J., & van Ravenzwaaij, D. (2021). Decisions about equivalence: A comparison of TOST, HDI-ROPE, and the Bayes factor. Psychological Methods. 1–16. https://doi.org/10.1037/met0000402
Lindley, D. V. (1957). A statistical paradox. Biometrika, 44, 187–192. https://doi.org/10.2307/2333251
Loftus, G. R., & Masson, M. E. J. (1994). Using confidence intervals in within-subject designs. Psychonomic Bulletin & Review, 1, 476–490.
Lovric, M. M. (2020). Conflicts in Bayesian statistics between inference based on credible intervals and Bayes factors. Journal of Modern Applied Statistical Methods, 18, 1–27.
Ly, A., Boehm, U., Heathcote, A., Turner, B. M., Forstmann, B., Marsman, M., & Matzke, D. (2017). A flexible and efficient hierarchical Bayesian approach to the exploration of individual differences in cognitive-model-based neuroscience. Computational Models of Brain and Behavior, 467–479. https://doi.org/10.1002/9781119159193.ch34
Ly, A., Raj, A., Etz, A., Marsman, M., Gronau, Q. F., & Wagenmakers, E.-J. (2018). Bayesian reanalyses from summary statistics: a guide for academic consumers. Advances in Methods and Practices in Psychological Science, 1, 367–374.
Ly, A., Verhagen, J., & Wagenmakers, E.-J. (2016). Harold Jeffreys’s default Bayes factor hypothesis tests: Explanation, extension, and application in psychology. Journal of Mathematical Psychology, 72, 19–32.
Maruyama, Y., & George, E. I. (2011). Fully Bayes factors with a generalized g-prior. The Annals of Statistics, 39, 2740–2765.
Masson, M. E. J. (2011). A tutorial on a practical Bayesian alternative to null-hypothesis significance testing. Behavior Research Methods, 43, 679–690.
Masson, M. E. J., & Loftus, G. R. (2003). Using confidence intervals for graphically based data interpretation. Canadian Journal of Experimental Psychology, 57, 203–220.
Morey, R. D. (2015a). Multiple comparisons with BayesFactor, Part 1. R-Bloggers. https://www.r-bloggers.com/2015/01/multiple-comparisons-with-bayesfactor-part-1/
Morey, R. D. (2015b). Multiple comparisons with BayesFactor, Part 2 - Order restrictions. BayesFactor. https://bayesfactor.blogspot.com/2015/01/multiple-comparisons-with-bayesfactor-2.html
Morey, R. D., Romeijn, J. W., & Rouder, J. N. (2016). The philosophy of Bayes factors and the quantification of statistical evidence. Journal of Mathematical Psychology, 72, 6–18.
Morey, R. D., & Rouder, J. N. (2022). BayesFactor: Computation of Bayes factors for common designs. R package version 0.9.12-4.4. https://cran.r-project.org/package=BayesFactor
Morey, R. D., Rouder, J. N., Pratte, M. S., & Speckman, P. L. (2011). Using MCMC chain outputs to efficiently estimate Bayes factors. Journal of Mathematical Psychology, 55, 368–378.
Nathoo, F. S., Kilshaw, R. E., & Masson, M. E. J. (2018). A better (Bayesian) interval estimate for within-subject designs. Journal of Mathematical Psychology, 86, 1–9.
Nathoo, F. S., & Masson, M. E. J. (2016). Bayesian alternatives to null-hypothesis significance testing for repeated-measures designs. Journal of Mathematical Psychology, 72, 144–157.
Raftery, A. E. (1995). Bayesian model selection in social research. Sociological Methodology, 25, 111–163.
Rouder, J. N., Morey, R. D., Speckman, P. L., & Province, J. M. (2012). Default Bayes factors for ANOVA designs. Journal of Mathematical Psychology, 56, 356–374.
Rouder, J. N., Morey, R. D., Verhagen, J., Swagman, A. R., & Wagenmakers, E.-J. (2017). Bayesian analysis of factorial designs. Psychological Methods, 22, 304–321.
Rouder, J. N., Speckman, P. L., Sun, D., Morey, R. D., & Iverson, G. (2009). Bayesian t tests for accepting and rejecting the null hypothesis. Psychonomic Bulletin & Review, 16, 225–237.
Schenker, N., & Gentleman, J. F. (2001). On judging the significance of differences by examining the overlap between confidence intervals. The American Statistician, 55, 182–186.
Stan Development Team (2023). RStan: The R interface to Stan. R package version 2.21.8. https://mc-stan.org/
Urry, H. L., van Reekum, C. M., Johnstone, T., Kalin, N. H., Thurow, M. E., Schaefer, H. S., Jackson, C. A., Frye, C. J., Greischar, L. L., Alexander, A. L., & Davidson, R. J. (2006). Amygdala and ventromedial prefrontal cortex are inversely coupled during regulation of negative affect and predict the diurnal pattern of cortisol secretion among older adults. Journal of Neuroscience, 26, 4415–4425.
van den Bergh, D., Wagenmakers, E.-J., & Aust, F. (2022). Bayesian repeated-measures ANOVA: An updated methodology implemented in JASP. PsyArXiv. 1-28. 10.31234/osf.io/fb8zn
Vogel, E. K., Woodman, G. F., & Luck, S. J. (2001). Storage of features, conjunctions, and objects in visual working memory. Journal of Experimental Psychology: Human Perception and Performance, 27, 92–114.
Wagenmakers, E.-J. (2007). A practical solution to the pervasive problems of p values. Psychonomic Bulletin & Review, 14, 779–804.
Wagenmakers, E.-J. (2022). Approximate objective Bayes factors from p-values and sample size: The \(3p\sqrt{n}\) rule. PsyArXiv. 1-50. https://doi.org/10.31234/osf.io/egydq
Wagenmakers, E.-J., Gronau, Q. F., Dablander, F., & Etz, A. (2022). The support interval. Erkenntnis, 87, 589–601.
Wagenmakers, E.-J., Lodewyckx, T., Kuriyal, H., & Grasman, R. (2010). Bayesian hypothesis testing for psychologists: A tutorial on the Savage-Dickey method. Cognitive Psychology, 60, 158–189.
Wagenmakers, E.-J., & Ly, A. (2023). History and nature of the Jeffreys-Lindley paradox. Archive for History of Exact Sciences, 77, 25–72.
Wang, M., & Liu, G. (2016). A simple two-sample Bayesian t-test for hypothesis testing. The American Statistician, 70, 195–201.
Wang, M., & Sun, X. (2014). Bayes factor consistency for one-way random effects model. Communications in Statistics - Theory and Methods, 43, 5072–5090.
Wei, Z., Nathoo, F. S., & Masson, M. E. J. (2022a). rmBayes: Performing Bayesian inference for repeated-measures designs. R package version 0.1.15. https://cran.r-project.org/package=rmBayes
Wei, Z., Yang, A., Rocha, L., Miranda, M. F., & Nathoo, F. S. (2022b). A review of Bayesian hypothesis testing and its practical implementations. Entropy, 24, 1–15.
Wetzels, R., Matzke, D., Lee, M. D., Rouder, J. N., Iverson, G. J., & Wagenmakers, E.-J. (2011). Statistical evidence in experimental psychology: An empirical comparison using 855 t tests. Perspectives on Psychological Science, 6, 291–298.
Zellner, A., & Siow, A. (1980). Posterior odds ratios for selected regression hypotheses. Trabajos de Estadística Y de Investigación Operativa, 31, 585–603.
Acknowledgements
We thank Eric-Jan Wagenmakers for bringing to our attention the potential link between the separation of credible intervals and the Jeffreys-Lindley paradox. We are also grateful for an anonymous referee’s helpful comments on the likelihood principle, the importance of sample size, and evidence for the null hypothesis.
Funding
This work was supported by discovery grants to Farouk S. Nathoo (RGPIN-04044-2020) and Michael E. J. Masson (RGPIN-2015-04773) from the Natural Sciences and Engineering Research Council. Farouk S. Nathoo holds a Tier II Canada Research Chair in Biostatistics for Spatial and High-Dimensional Data.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflicts of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A
Proof of Theorem 1
After some substitutions, the Pearson Bayes factor in Equation 9 becomes
By applying Stirling’s formula Γ(y + z) ∼ yzΓ(y) as y → + ∞, the gamma ratio in Equation A1 becomes \(\frac{\Gamma \left(\frac{a\left(n-1\right)}{2}\right)}{\Gamma \left(\frac{an-1}{2}\right)}\sim {\left(\frac{an}{2}\right)}^{\frac{1-a}{2}}\) as n → + ∞.
We calculated the separation for the standard between-subjects confidence interval in Equation A2 as the absolute value of the difference between two sample means over the twofold interval width. Here, we consider only a = 2.
As n → + ∞, \({t}_{1-\frac{\alpha }{2},\ a\left(n-1\right)}^{\ast}\sim {z}_{1-\frac{\alpha }{2}}\), the sample means converge to the population means, and (SSB/SSW)2 is assumed to be the least significant in the exponential series. \({SS}_{\mathrm{B}}=n\overset{a}{\sum\limits_{i=1}}{\left({M}_{i\cdot }-M\right)}^2\) reduces to \(\frac{1}{2}n{\left({M}_{1\cdot }-{M}_{2\cdot}\right)}^2\)when a = 2. Plugging Equation A2 in the limit below, we obtain
Hence, the asymptotic approximation of the log Pearson Bayes factor is a quadratic function of the separation of the standard confidence interval for population means in balanced one-way between-subjects designs, as the number of subjects goes to infinity. To check the convergence of the limit, we plot the (solid black) fitted line for the relationship between the log JZS-BF10 and squared separation score, along with the (dashed red) analytic line from plugging the known values into the formula of Theorem 1, in Fig. 10. As the sample size increases, the two lines become closer. We still expect some variations between these lines even for very large n because Theorem 1 applies for the Pearson Bayes factor, whereas the quadratic exponential is interpolated for the JZS Bayes factor.
Appendix B
R packages to perform Bayesian inference
The ‘rmBayes’ package performs Bayesian interval estimation for both the homoscedastic and heteroscedastic cases in either between- or within-subject designs that include a single independent variable. The Stan-based R source package installation will take a few minutes because models need to be compiled into dynamic shared objects. We recommend using R version 4.0.1 or later and installing the pre-compiled binary package so users do not have to worry about C++ compiler issues. The relevant commands are:
> install.packages("rmBayes", type = "binary")
> library(rmBayes)
The rmHDI function in ‘rmBayes’ provides multiple methods to construct the credible intervals for population means, with each method based on different sets of priors. The default method implements the NUTS algorithm and constructs the within-subject HDI corresponding to the JZS-HDI case in Table 1. More methods documentation can be viewed on GitHub, https://zhengxiaouvic.github.io/rmBayes/. The following example includes a partial data set, a call to the rmHDI function, and the resulting output. The partial data set is also shown in wide format.
> ## Data are in the long format. 10 subjects. 3 conditions.
> head(recall.long, 2)
Subject Level Response
1 s1 Level1 10
2 s2 Level1 6
> rmHDI(recall.long, whichSubject = "Subject", whichLevel = "Level", whichResponse = "Response", seed = 277) #macOS (Apple chip)
$HDI
lower upper
Level1 10.47101 11.59361
Level2 12.39176 13.51436
Level3 13.55086 14.67346
$`posterior means`
Level1 Level2 Level3
11.03231 12.95306 14.11216
$width
[1] 0.5613014
> ## Same data are in the wide format.
> head(recall.wide, 2)
Level1 Level2 Level3
s1 10 13 13
s2 6 8 8
> rmHDI(data.wide= recall.wide, seed = 277)
An alternative method for computing HDIs is possible using the ‘BayesFactor’ package, which computes Bayes factors for several experimental designs. The anovaBF function can first be used to generate the Bayes factor for a within-subject design.
> library(BayesFactor); set.seed(277)
> anovaBF(Response ~ Level + Subject, data = recall.long, whichRandom = "Subject", iterations = 100000, progress = FALSE)
Bayes factor analysis
--------------
[1] Level + Subject : 36469.12 ±0.32%
Against denominator:
Response ~ Subject
---
Bayes factor type: BFlinearModel, JZS
Then, Gibbs sampling can obtain parameter estimates from the posterior distribution of the Bayes factor object numerator. Those estimates are plugged into the interval equations for LH- or JZS-HDI in Table 1 to construct the within-subject HDI. This method can be implemented using the R code below, which defines a new anovaHDI function. Both anovaHDI and the default rmHDI functions assume all the same priors but use different sampling algorithms for establishing posterior distributions.
> anovaHDI <- function(data, whichSubject, whichLevel, whichResponse, cred, iter) {
#' input arguments are defined as in the rmHDI function
n <- length(unique(data[,whichSubject]))
a <- length(unique(data[,whichLevel]))
BF <- BayesFactor::anovaBF(as.formula(paste(whichResponse, "~", whichLevel, "+", whichSubject)), data = data, whichRandom = whichSubject, iterations = iter, progress = FALSE)
chains <- BayesFactor::posterior(BF, iterations = iter, progress = FALSE)
mu.chains <- chains[,2:(a+1)] + chains[,1]
widths <- qt((1 + cred) / 2, df = a * (n - 1)) * sqrt(chains[,"sig2"] / n)
uprs <- mu.chains + widths
lwrs <- mu.chains - widths
matrix(c(colMeans(lwrs),colMeans(uprs)), nrow = a, dimnames = list(paste("Level",1:a), c("lower","upper")))
}
> set.seed(277)
> anovaHDI(recall.long, "Subject", "Level", "Response", .95, 100000)
lower upper
Level 1 10.50752 11.64187
Level 2 12.41498 13.54934
Level 3 13.56208 14.69644
Appendix C
Monte Carlo error and a data permutation issue
Users should expect different results if they vary the number of iterations or the random seed used in MCMC. Such variability is referred to as Monte Carlo error. We examined Monte Carlo error in computing Bayes factors by applying the anovaBF function 500 times (each containing 100,000 MCMC iterations; the default value is 10,000) with different random seeds on the same set of simulated within-subject data. Among these 500 runs, one Bayes factor outlier was as extreme as 11.7, although the vast majority of values ranged from 4.3 to 5.4. R scripts for this and the following examples are available at https://osf.io/x2pvw/.
Similarly, Monte Carlo error is associated with the Bayesian interval estimation. We replicated rmHDI and anovaBF functions 100 times (each containing 20,000 MCMC draws – 2 chains with 10,000 iterations each in rmHDI) with different random seeds on the same simulated within-subject data. Furthermore, we visualized the resulting variability of the posterior mean estimates, HDI widths, and HDI separation via density plots. The whole process was repeated several times for data sets having different Bayes factors, correlations between conditions, and sample sizes. One example is exhibited in Fig. 11. In different realizations of the draws from the posterior distribution, it is also worthwhile to note a data permutation issue that affects the simulation. That is, the same experimental data are used but permuted by row (e.g., switch Subject 10 up to the second place) or by column (e.g., Level-high and Level-low rather than Level-low and Level-high order). Permutation of the entries in a data file will result in slightly different estimates even if the random seed stays the same. We randomly permuted the data by row but fixed the same random seed when calling rmHDI to assess the magnitude of the permutation issue relative to Monte Carlo error. In Fig. 11, two density plots (permuted data but using a constant random seed, or not permuted but varying the random seed) generated from results provided by the rmHDI function in the ‘rmBayes’ package highly overlap, indicating that permutation of the data produces variability in outcomes of a similar magnitude to setting different random seeds. Moreover, the functions in the ‘BayesFactor’ package returned less variability in estimates for posterior means but more variability in estimates for standard error of the mean (thus, interval width) and posterior mean difference, whereas the rmHDI performance is quite the opposite. The separation percentage is less variable when calling rmHDI, as shown in panel E of Fig. 11. Although the models and priors assumed by the two packages are identical, there may be differences in the actual code implementation, especially for Equations 2 and 4 and MCMC samplers (Gibbs sampling in the anovaBF and NUTS in the rmHDI), leading to differences in the variable results.
In the rmHDI function, the default setting for the argument permuted is TRUE, meaning the converted wide-format data are first ordered by their column names in alphabetic order. Then, the data are placed in ascending order by the first and second columns.
Appendix D
Warning messages regarding sampling and effective Monte Carlo sample size
The Stan website https://mc-stan.org/misc/warnings lists all the potential warnings in running an MCMC. Three common warnings are related to the exceeded maximum tree depth (a concern for long execution time), low bulk effective samples size (ESS, indicating posterior means and medians may be unreliable), and low tail ESS (indicating posterior variances and tail quantiles may be unreliable). The relevance of these warnings depends on the specific data being analyzed. Visit the website https://osf.io/x2pvw/ for an example.
We suspect that a high correlation between conditions in a within-subject design might result in slower, inefficient sampling due to a computed likelihood with elongated elliptical contours. The latter two warnings indicate that the sampler is moving slowly. After accounting for the correlation across successive draws of the Markov chain sampler, the ESS is low. For example, if the lag-1 autocorrelation of the MCMC sampling output is high (e.g., above .97), then 2,000 iterations can be worth, say fewer than 100 independent draws. The warning disappears with 10,000 iterations because the effective sample size may then be sufficiently high to cross the threshold in Stan (it might be an ESS of approximately 500).
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wei, Z., Nathoo, F.S. & Masson, M.E.J. Investigating the relationship between the Bayes factor and the separation of credible intervals. Psychon Bull Rev 30, 1759–1781 (2023). https://doi.org/10.3758/s13423-023-02295-1
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.3758/s13423-023-02295-1