A default Bayesian hypothesis test for mediation


In order to quantify the relationship between multiple variables, researchers often carry out a mediation analysis. In such an analysis, a mediator (e.g., knowledge of a healthy diet) transmits the effect from an independent variable (e.g., classroom instruction on a healthy diet) to a dependent variable (e.g., consumption of fruits and vegetables). Almost all mediation analyses in psychology use frequentist estimation and hypothesis-testing techniques. A recent exception is Yuan and MacKinnon (Psychological Methods, 14, 301–322, 2009), who outlined a Bayesian parameter estimation procedure for mediation analysis. Here we complete the Bayesian alternative to frequentist mediation analysis by specifying a default Bayesian hypothesis test based on the Jeffreys–Zellner–Siow approach. We further extend this default Bayesian test by allowing a comparison to directional or one-sided alternatives, using Markov chain Monte Carlo techniques implemented in JAGS. All Bayesian tests are implemented in the R package BayesMed (Nuijten, Wetzels, Matzke, Dolan, & Wagenmakers, 2014).

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2


  1. 1.

    We generated data that covaried exactly according to the input covariance matrix. Because the covariances of the data were equal to the covariances of the population, there was no need to control for random sampling, and we simulated only one experiment per scenario. The full simulation code is available in the supplemental materials.

  2. 2.

    The approximation can be made arbitrarily close by increasing the number of MCMC samples.

  3. 3.

    We thank an anonymous reviewer for pointing this out to us.

  4. 4.

    We compared the fit of four distributions: a nonstandardized t-distribution, a normal distribution, a nonparametric distribution estimated with the spline interpolation function splinefun in R, and a nonparametric distribution estimated with the R function logspline that also uses splines to estimate the log density. All four distributions fitted reasonably well: The Bayes factors of the analytical test and the SD method are similar with all different posterior distributions. All four distributions are therefore included in the R package BayesMed and can be used when applying the SD method.


  1. Armstrong, A. M., & Dienes, Z. (2013). Subliminal understanding of negation: Unconscious control by subliminal processing of word pairs. Consciousness and Cognition, 22, 1022–1040.

    Article  PubMed  Google Scholar 

  2. Berger, J. O. (2006). Bayes factors. In S. Kotz, N. Balakrishnan, C. Read, B. Vidakovic, & N. L. Johnson (Eds.), Encyclopedia of statistical sciences, vol. 1 (2nd ed., pp. 378–386). Hoboken, NJ: Wiley.

    Google Scholar 

  3. Berger, J. O., & Delampady, M. (1987). Testing precise hypotheses. Statistical Science, 2, 317–352.

    Article  Google Scholar 

  4. Berger, J. O., & Wolpert, R. L. (1988). The likelihood principle (2nd ed.). Hayward (CA): Institute of Mathematical Statistics.

    Google Scholar 

  5. Consonni, G., Forster, J. J., & La Rocca, L. (2013). The whetstone and the alum block: Balanced objective Bayesian comparison of nested models for discrete data. Statistical Science, 28, 398–423.

    Article  Google Scholar 

  6. Dickey, J. M., & Lientz, B. P. (1970). The weighted likelihood ratio, sharp hypotheses about chances, the order of a Markov chain. Annals of Mathematical Statistics, 41, 214–226.

    Article  Google Scholar 

  7. Dienes, Z. (2008). Understanding psychology as a science: An introduction to scientific and statistical inference. New York: Palgrave MacMillan.

    Google Scholar 

  8. Dienes, Z. (2011). Bayesian versus orthodox statistics: Which side are you on? Perspectives on psychological. Science, 6, 274–290.

    Google Scholar 

  9. Edwards, W., Lindman, H., & Savage, L. J. (1963). Bayesian statistical inference for psychological research. Psychological Review, 70, 193–242.

    Article  Google Scholar 

  10. Elliot, D. L., Goldberg, L., Kuehl, K. S., Moe, E. L., Breger, R. K., & Pickering, M. A. (2007). The phlame (promoting healthy lifestyles: Alternative models’ effects) firefighter study: Outcomes of two models of behavior change. Journal of Occupational and Environmental Medicine, 49(2), 204–213.

    Article  PubMed  Google Scholar 

  11. Green, P. J. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika, 82, 711–732.

    Article  Google Scholar 

  12. Guo, X., Li, F., Yang, Z., & Dienes, Z. (2013). Bidirectional transfer between metaphorical related domains in implicit learning of form-meaning connections. PLoS ONE, 8, e68100.

    Article  PubMed Central  PubMed  Google Scholar 

  13. Hoijtink, H., Klugkist, I., & Boelen, P. (2008). Bayesian evaluation of informative hypotheses. New York: Springer.

    Google Scholar 

  14. Iverson, G. J., Wagenmakers, E. J., & Lee, M. D. (2010). A model averaging approach to replication: The case of p rep . Psychological Methods, 15, 172–181.

    Article  PubMed  Google Scholar 

  15. Jeffreys, H. (1961). Theory of Probability (3rd ed.). Oxford, UK: Oxford University Press

  16. Kass, R. E., & Wasserman, L. (1995). A reference Bayesian test for nested hypotheses and its relationship to the Schwarz criterion. Journal of the American Statistical Association, 90, 928–934.

    Article  Google Scholar 

  17. Klugkist, I., Laudy, O., & Hoijtink, H. (2005). Inequality constrained analysis of variance: A Bayesian approach. Psychological Methods, 10, 477.

    Article  PubMed  Google Scholar 

  18. Kruschke, J. K. (2010). Doing Bayesian data analysis: A tutorial introduction with R and BUGS. Burlington, MA: Academic Press.

    Google Scholar 

  19. Lee, M. D., & Wagenmakers, E. J. (2013). Bayesian modeling for cognitive science: A practical course. Germany: Cambridge University Press.

  20. Lewis, S. M., & Raftery, A. E. (1997). Estimating Bayes factors via posterior simulation with the Laplace–Metropolis estimator. Journal of the American Statistical Association, 92, 648–655.

    Google Scholar 

  21. Liang, F., Paulo, R., Molina, G., Clyde, M. A., & Berger, J. O. (2008). Mixtures of g priors for Bayesian variable selection. Journal of the American Statistical Association, 103, 410–423.

    Article  Google Scholar 

  22. Lindley, D. V. (1957). A statistical paradox. Biometrika, 44, 187–192.

    Article  Google Scholar 

  23. MacKinnon, D. P., Fairchild, A., & Fritz, M. (2007). Mediation analysis. Annual Review of Psychology, 58, 593.

    Article  PubMed Central  PubMed  Google Scholar 

  24. MacKinnon, D. P., Lockwood, C. M., & Hoffman, J. (1998). A new method to test for mediation. Paper presented at the annual meeting of the Society for Prevention Research, Park City, UT.

  25. MacKinnon, D. P., Lockwood, C., Hoffman, J., West, S., & Sheets, V. (2002). A comparison of methods to test mediation and other intervening variable effects. Psychological Methods, 7, 83–104.

    Article  PubMed Central  PubMed  Google Scholar 

  26. MacKinnon, D. P., Lockwood, C. M., & Williams, J. (2004). Confidence limits for the indirect effect: Distribution of the product and resampling methods. Multivariate Behavioral Research, 39, 99–128.

    Article  PubMed Central  PubMed  Google Scholar 

  27. MacKinnon, D. P., Warsi, G., & Dwyer, J. H. (1995). A simulation study of mediated effect measures. Multivariate Behavioral Research, 30, 41–62.

    Article  PubMed Central  PubMed  Google Scholar 

  28. Morey, R. D., & Rouder, J. N. (2011). Bayes factor approaches for testing interval null hypotheses. Psychological Methods, 16, 406–419.

    Article  PubMed  Google Scholar 

  29. Morey, R. D., & Wagenmakers, E. J. (2014). Simple relation between one–sided and two–sided Bayesian point–null hypothesis tests. Manuscript submitted for publication.

  30. Myung, I. J., & Pitt, M. A. (1997). Applying Occam’s razor in modeling cognition: A Bayesian approach. Psychonomic Bulletin & Review, 4, 79–95.

    Article  Google Scholar 

  31. Nuijten, M. B., Wetzels, R., Matzke, D., Dolan, C. V., & Wagenmakers, E. J. (2014). BayesMed: Default Bayesian hypothesis tests for correlation, partial correlation, and mediation. R package version 1.0. http://CRAN.R-project.org/package=BayesMed

  32. O’Hagan, A., & Forster, J. (2004). Kendall’s advanced theory of statistics vol. 2B: Bayesian inference (2nd ed.). London: Arnold.

    Google Scholar 

  33. Overstall, A. M., & Forster, J. J. (2010). Default Bayesian model determination methods for generalised linear mixed models. Computational Statistics & Data Analysis, 54, 3269–3288.

    Article  Google Scholar 

  34. Pericchi, L. R., Liu, G., & Torres, D. (2008). Objective Bayes factors for informative hypotheses: “Completing” the informative hypothesis and “splitting” the Bayes factor. In H. Hoijtink, I. Klugkist, & P. A. Boelen (Eds.), Bayesian evaluation of informative hypotheses (pp. 131–154). New York: Springer Verlag.

    Google Scholar 

  35. Plummer, M. (2009). JAGS version 1.0. 3 manual. URL: http://www-ice.iarc.fr/~martyn/software/jags/jags_user_manual. pdf

  36. R Core Team. (2012). R: A language and environment for statistical computing []. Vienna, Austria. APACrefURL http://www.R-project.org/ ISBN 3-900051-07-0

  37. Rouder, J. N., & Morey, R. D. (2012). Default Bayes factors for model selection in regression. Multivariate Behavioral Research, 47, 877–903.

    Article  Google Scholar 

  38. Rouder, J. N., Morey, R. D., Speckman, P. L., & Province, J. M. (2012). Default Bayes factors for ANOVA designs. Journal of Mathematical Psychology, 56, 356–374.

    Article  Google Scholar 

  39. Rouder, J. N., Speckman, P. L., Sun, D., Morey, R. D., & Iverson, G. (2009). Bayesian t tests for accepting and rejecting the null hypothesis. Psychonomic Bulletin & Review, 16, 225–237.

    Article  Google Scholar 

  40. Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6, 461–464.

    Article  Google Scholar 

  41. Sellke, T., Bayarri, M. J., & Berger, J. O. (2001). Calibration of p values for testing precise null hypotheses. The American Statistician, 55, 62–71.

    Article  Google Scholar 

  42. Semmens-Wheeler, R., Dienes, Z., & Duka, T. (2013). Alcohol increases hypnotic susceptibility. Consciousness and Cognition, 22(3), 1082–1091.

    Article  PubMed  Google Scholar 

  43. Sobel, M. E. (1982). Asymptotic confidence intervals for indirect effects in structural equation models. Sociological Methodology, 13, 290–312.

    Article  Google Scholar 

  44. Vandekerckhove, J, Matzke, D., & Wagenmakers, E. J. (in press). Model comparison and the principle of parsimony. In J. Busemeyer, J. Townsend, Z. J. Wang, & A. Eidels (Eds.), Oxford handbook of computational and mathematical psychology. Oxford University Press.

  45. Venzon, D., & Moolgavkar, S. (1988). A method for computing profile-likelihood-based confidence intervals. Applied Statistics, 37(1), 87–94.

  46. Verhagen, J., & Wagenmakers, E. J. (in press). A Bayesian test to quantify the success or failure of a replication attempt. Journal of Experimental Psychology: General.

  47. Wagenmakers, E. J. (2007). A practical solution to the pervasive problems of p values. Psychonomic Bulletin & Review, 14, 779–804.

    Article  Google Scholar 

  48. Wagenmakers, E. J., & Grünwald, P. (2006). A Bayesian perspective on hypothesis testing. Psychological Science, 17, 641–642.

    Article  PubMed  Google Scholar 

  49. Wagenmakers, E. J., Lodewyckx, T., Kuriyal, H., & Grasman, R. (2010). Bayesian hypothesis testing for psychologists: A tutorial on the Savage–Dickey method. Cognitive Psychology, 60, 158–189.

    Article  PubMed  Google Scholar 

  50. Wagenmakers, E. J., Wetzels, R., Borsboom, D., & van der Maas, H. L. J. (2011). Why psychologists must change the way they analyze their data: The case of psi. Journal of Personality and Social Psychology, 100, 426–432.

    Article  PubMed  Google Scholar 

  51. Wetzels, R., Grasman, R. P. P. P., & Wagenmakers, E. J. (2010). An encompassing prior generalization of the Savage–Dickey density ratio test. Computational Statistics & Data Analysis, 54, 2094–2102.

    Article  Google Scholar 

  52. Wetzels, R., Grasman, R. P. P. P., & Wagenmakers, E. J. (2012). A default Bayesian hypothesis test for ANOVA designs. The American Statistician, 66, 104–111.

    Article  Google Scholar 

  53. Wetzels, R., Matzke, D., Lee, M. D., Rouder, J. N., Iverson, G. J., & Wagenmakers, E. J. (2011). Statistical evidence in experimental psychology: An empirical comparison using 855 t tests. Perspectives on Psychological Science, 6, 291–298.

    Article  Google Scholar 

  54. Wetzels, R., Raaijmakers, J. G. W., Jakab, E., & Wagenmakers, E. J. (2009). How to quantify support for and against the null hypothesis: A flexible WinBUGS implementation of a default Bayesian t test. Psychonomic Bulletin & Review, 16, 752–760.

    Article  Google Scholar 

  55. Wetzels, R., & Wagenmakers, E. J. (2012). A default Bayesian hypothesis test for correlations and partial correlations. Psychonomic Bulletin & Review, 19, 1057–1064.

    Article  Google Scholar 

  56. Yuan, Y., & MacKinnon, D. P. (2009). Bayesian mediation analysis. Psychological Methods, 14, 301–322.

    Article  PubMed Central  PubMed  Google Scholar 

  57. Zellner, A., & Siow, A. (1980). Posterior odds ratios for selected regression hypotheses. In J. M. Bernardo, M. H. DeGroot, D. V. Lindley, & A. F. M. Smith (Eds), Bayesian statistics (pp. 585–603). Valencia: University Press.

Download references


This research was supported by an ERC grant from the European Research Council. Conor V. Dolan is supported by the European Research Council (Genetics of Mental Illness; grant number: ERC–230374). Ruud Wetzels is supported by the Dutch national program COMMIT.

Author information



Corresponding author

Correspondence to Eric-Jan Wagenmakers.

Electronic supplementary material

Below is the link to the electronic supplementary material.


(DOCX 49.7 KB)


(DOCX 1.73 KB)


(DOCX 1.85 KB)



Appendix 1. JAGS code

JAGS code for correlation


JAGS code for partial correlation


Appendix 2. Testing the correctness of our JAGS implementation

To assess the correctness of our JAGS implementation, we compared the analytical results for the two-sided Bayes factor against the Savage-Dickey density ratio results based on the MCMC samples from JAGS. The distribution that fit the posterior samples bestFootnote 4 is the nonstandardized t-distribution with the following density:

$$ p\left(x\left|\nu, \mu, \sigma \right.\right)=\frac{\varGamma \left(\frac{\nu +1}{2}\right)}{\varGamma \left(\frac{\nu }{2}\right)\sqrt{\left(\pi \nu \sigma \right)}}{\left(1+\frac{1}{\nu }{\displaystyle {\left(\frac{x-\mu }{\sigma}\right)}^2}\right)}^{-\frac{\nu +1}{2}}, $$

with ν degrees of freedom, location parameter μ, and scale parameter σ. With the samples of the parameter of interest, we can estimate ν, μ, and σ and, thus, the exact shape of the distribution and the exact height of the distribution at the point of interest.

We checked the fit of this distribution and the performance of the SD method in a small simulation study. We considered the following sample sizes: N = 20, 40, 80, or 160. We simulated correlational data by drawing N values for X from a standard normal distribution, and conditional on X, we simulated values for Y according to the following equation:

$$ {Y}_i={\beta}_0+\tau {X}_i+\varepsilon, $$

where the subscript i denotes subject i and τ represents the relation between X and Y. For each of the four sample sizes, we generated 100 data sets, in each of which τ was drawn from a standard uniform distribution.

Next, we tested the correlation in each data set with both the analytical Bayesian correlation test and the SD method with the nonstandardized t-distribution and compared the results. The results are shown in Fig. 3. The figure shows that the proposed SD method performs well: The Bayes factors of the analytical test and the SD method are similar for all sample sizes and correlations.

Fig. 3

Natural logarithm of the Bayes factors for correlation obtained with analytical calculations (x axis) or obtained with the SD method based on a nonstandardized t-distribution (y axis) for different sample sizes (N). The graphs show fewer points as the samples grow larger, because in these situations, there are more extreme Bayes factors that fall outside the axis limits. We restricted the graphs, since it is most important that the lower Bayes factors lie on the diagonal; it is not important whether a Bayes factor is 2,000 or 3,000, since it is overwhelming evidence in any case

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Nuijten, M.B., Wetzels, R., Matzke, D. et al. A default Bayesian hypothesis test for mediation. Behav Res 47, 85–97 (2015). https://doi.org/10.3758/s13428-014-0470-2

Download citation


  • Bayes factor
  • Evidence
  • Mediated effects