Using Bayes factor hypothesis testing in neuroscience to establish evidence of absence

Keysers, Christian; Gazzola, Valeria; Wagenmakers, Eric-Jan

doi:10.1038/s41593-020-0660-4

Using Bayes factor hypothesis testing in neuroscience to establish evidence of absence

Review Article
Published: 29 June 2020

Volume 23, pages 788–799, (2020)
Cite this article

From

View current issue Submit your manuscript

22k Accesses
331 Citations
247 Altmetric
5 Mentions
Explore all metrics

An Author Correction to this article was published on 06 October 2020

This article has been updated

Abstract

Most neuroscientists would agree that for brain research to progress, we have to know which experimental manipulations have no effect as much as we must identify those that do have an effect. The dominant statistical approaches used in neuroscience rely on P values and can establish the latter but not the former. This makes non-significant findings difficult to interpret: do they support the null hypothesis or are they simply not informative? Here we show how Bayesian hypothesis testing can be used in neuroscience studies to establish both whether there is evidence of absence and whether there is absence of evidence. Through simple tutorial-style examples of Bayesian t-tests and ANOVA using the open-source project JASP, this article aims to empower neuroscientists to use this approach to provide compelling and rigorous evidence for the absence of an effect.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

**Fig. 1: P value of a t-test and BF₊₀ as a function of effect size and sample size.**

**Fig. 2: Hypothesis testing under the Bayesian framework.**

**Fig. 3: Further outputs for the Bayesian t-test on muscimol1.csv.**

**Fig. 4: Illustration of the data for the two simulated scenarios.**

**Fig. 5: Screenshot from the ‘Bayesian Independent Samples T-Test’ in JASP.**

**Fig. 6: Screenshot of the Bayesian repeated measures ANOVA of muscimol1.**

Four Neuroimaging Questions that P-Values Cannot Answer (and Bayesian Analysis Can)

Within-subject mediation analysis for experimental data in cognitive psychology and neuroscience

Article 15 December 2017

Plausible Reasoning in Neuroscience

Data availability

All data and code can be downloaded at https://osf.io/md9kp/.

Change history

06 October 2020
An amendment to this paper has been published and can be accessed via a link at the top of the paper.

References

Benjamin, D. J. et al. Redefine statistical significance. Nat. Hum. Behav. 2, 6–10 (2018).
Article Google Scholar
Dienes, Z. Using Bayes to get the most out of non-significant results. Front. Psychol. 5, 781 (2014).
Article Google Scholar
Gallistel, C. R. The importance of proving the null. Psychol. Rev. 116, 439–453 (2009).
Article CAS Google Scholar
Rouder, J. N., Speckman, P. L., Sun, D., Morey, R. D. & Iverson, G. Bayesian t tests for accepting and rejecting the null hypothesis. Psychon. Bull. Rev. 16, 225–237 (2009).
Article Google Scholar
Love, J. et al. JASP: Graphical statistical software for common statistical designs. J. Stat. Softw. 88, 1–17 (2019).
Article Google Scholar
Wagenmakers, E.-J. et al. The need for Bayesian hypothesis testing in psychological science. in Psychological Science Under Scrutiny: Recent Challenges and Proposed Solutions (eds. Lilienfeld, S. O. & Waldman, I.) 123–138 (Wiley, 2017).
Altman, D. G. & Bland, J. M. Absence of evidence is not evidence of absence. Br. Med. J. 311, 485 (1995).
Article CAS Google Scholar
Edwards, W., Lindman, H. & Savage, L. J. Bayesian statistical inference for psychological research. Psychol. Rev. 70, 193–242 (1963).
Article Google Scholar
Jeffreys, H. Theory of Probability (Oxford University Press, 1961).
Szucs, D. & Ioannidis, J. P. A. Empirical assessment of published effect sizes and power in the recent cognitive neuroscience and psychology literature. PLoS Biol. 15, e2000797 (2017).
Article Google Scholar
Etz, A. & Wagenmakers, E.-J. J. B. S. Haldane’s contribution to the Bayes factor hypothesis test. Stat. Sci. 32, 313–329 (2017).
Article Google Scholar
Kass, R. E. & Raftery, A. E. Bayes factors. J. Am. Stat. Assoc. 90, 773–795 (1995).
Article Google Scholar
Lee, M. D. & Wagenmakers, E.-J. Bayesian Cognitive Modeling: A Practical Course (Cambridge University Press, 2013).
Morey, R. D. & Rouder, J. N. BayesFactor: computation of Bayes factors for common designs. v. 0.9.12–4.2 https://cran.r-project.org/package=BayesFactor (2018).
Carrillo, M. et al. Emotional mirror neurons in the rat’s anterior cingulate cortex. Curr. Biol. 29, 1301–1312.e6 (2019).
Article CAS Google Scholar
Jeffreys, H. Theory of Probability (Oxford University Press, 1939).
Nieuwenhuis, S., Forstmann, B. U. & Wagenmakers, E.-J. Erroneous analyses of interactions in neuroscience: a problem of significance. Nat. Neurosci. 14, 1105–1107 (2011).
Article CAS Google Scholar
Gelman, A. & Stern, H. The difference between “significant” and “not significant” is not itself statistically significant. Am. Stat. 60, 328–331 (2006).
Article Google Scholar
Morey, R. D. & Rouder, J. N. Bayes factor approaches for testing interval null hypotheses. Psychol. Methods 16, 406–419 (2011).
Article Google Scholar
Rouder, J. N., Morey, R. D., Speckman, P. L. & Province, J. M. Default Bayes factors for ANOVA designs. J. Math. Psychol. 56, 356–374 (2012).
Article Google Scholar
Rouder, J. N., Engelhardt, C. R., McCabe, S. & Morey, R. D. Model comparison in ANOVA. Psychon. Bull. Rev. 23, 1779–1786 (2016).
Article Google Scholar
Myung, I. J. & Pitt, M. A. Applying Occam’s razor in modeling cognition: a Bayesian approach. Psychon. Bull. Rev. 4, 79–95 (1997).
Article Google Scholar
Efron, B. Why isn’t everyone a Bayesian? Am. Stat. 40, 1–5 (1986).
Google Scholar
Lee, M. D. & Vanpaemel, W. Determining informative priors for cognitive models. Psychon. Bull. Rev. 25, 114–127 (2018).
Article Google Scholar
Bayarri, M. J., Berger, J. O., Forte, A. & Garcia-Donato, G. Criteria for Bayesian model choice with application to variable selection. Ann. Stat. 40, 1550–1577 (2012).
Article Google Scholar
Cremers, H. R., Wager, T. D. & Yarkoni, T. The relation between statistical power and inference in fMRI. PLoS ONE 12, e0184923 (2017).
Article Google Scholar
Morey, R. D., Hoekstra, R., Rouder, J. N., Lee, M. D. & Wagenmakers, E.-J. The fallacy of placing confidence in confidence intervals. Psychon. Bull. Rev. 23, 103–123 (2016).
Article Google Scholar
Marsman, M., Waldorp, L., Dablander, F. & Wagenmakers, E. J. Bayesian estimation of explained variance in ANOVA designs. Stat. Neerl. 73, 351–372 (2019).
PubMed PubMed Central Google Scholar
van Doorn, J., Marsman, M., Ly, A. & Wagenmakers, E.-J. Bayesian rank-based hypothesis testing for the rank sum test, the signed rank test, and Spearman’s ρ. J. Appl. Stat. https://doi.org/10.1080/02664763.2019.1709053 (2020).
Wagenmakers, E.-J., Morey, R. D. & Lee, M. D. Bayesian benefits for the pragmatic researcher. Curr. Dir. Psychol. Sci. 25, 169–176 (2016).
Article Google Scholar
Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction. (The MIT Press, 1998).
Wrinch, D. & Jeffreys, H. On certain fundamental principles of scientific inquiry. Philos. Mag. 42, 368–374 (1923).
Article Google Scholar
Rozeboom, W. W. The fallacy of the null-hypothesis significance test. Psychol. Bull. 57, 416–428 (1960).
Article CAS Google Scholar
Stefan, A. M., Gronau, Q. F., Schönbrodt, F. D. & Wagenmakers, E.-J. A tutorial on Bayes factor design analysis using an informed prior. Behav. Res. Methods 51, 1042–1058 (2019).
Article Google Scholar
Wagenmakers, E.-J. et al. Bayesian inference for psychology. Part II: example applications with JASP. Psychon. Bull. Rev. 25, 58–76 (2018).
Article Google Scholar
Rouder, J. N. Optional stopping: no problem for Bayesians. Psychon. Bull. Rev. 21, 301–308 (2014).
Article Google Scholar
Schönbrodt, F. D. & Wagenmakers, E.-J. Bayes factor design analysis: planning for compelling evidence. Psychon. Bull. Rev. 25, 128–142 (2018).
Article Google Scholar
Consonni, G., Fouskakis, D., Liseo, B. & Ntzoufras, I. Prior distributions for objective Bayesian analysis. Bayesian Anal. 13, 627–679 (2018).
Article Google Scholar
Gronau, Q. F., Ly, A. & Wagenmakers, E.-J. Informed Bayesian t-tests. Am. Stat. 74, 137–143 (2019).
Article Google Scholar

Download references

Acknowledgements

C.K. is funded by NWO VICI grant 453-15-009; V.G. is funded by ERC grant 758703 and NWO VIDI grant 452-14-015; E.J.W. is funded by NWO VICI grant 453-16-003. We thank F. Bartos for help with Fig. 2.

Author information

Authors and Affiliations

Netherlands Institute for Neuroscience, Royal Netherlands Academy of Arts and Sciences, Amsterdam, The Netherlands
Christian Keysers & Valeria Gazzola
Department of Psychology, University of Amsterdam, Amsterdam, The Netherlands
Christian Keysers, Valeria Gazzola & Eric-Jan Wagenmakers

Authors

Christian Keysers
View author publications
You can also search for this author in PubMed Google Scholar
Valeria Gazzola
View author publications
You can also search for this author in PubMed Google Scholar
Eric-Jan Wagenmakers
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors conceived the project together and contributed to the writing of the manuscript. E.J.W. coordinates the development of JASP.

Corresponding author

Correspondence to Christian Keysers.

Ethics declarations

Competing interests

E.J.W. declares that he coordinates the development of the open-source software package JASP (https://jasp-stats.org), a non-commercial, publicly-funded effort to make Bayesian statistics accessible to a broader group of researchers and students. C.K. and V.G. declare no competing interests.

Additional information

Peer review information Nature Neuroscience thanks Denise Cai, Zhe Dong, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 The relationship between BF, p, and effect sizes values.

a, This log-log plot shows the BF₊₀ values corresponding to familiar critical p values for a one-tailed one-sample t-test at different sample sizes (n). The curves show the BF₊₀ values obtained in a Bayesian t-test based on the critical t-value that provides P=0.05 (yellow), P=0.01 (green), P=0.005 (black) and P=0.001 (black). The yellow dashed horizontal line indicates the BF₊₀=3 bound for moderate evidence considered by Jeffreys⁹ to be similar to P=0.05, the green one the BF₊₀=10 for strong evidence considered similar to P=0.01. The two black dashed lines mark BF₊₀=1, i.e. the line of no evidence, and BF₊₀=1/3, the bound for moderate evidence of absence. The background gradient reminds the reader that the BF reference values of 3 and 10 should not be considered hard bounds. Instead the BF should be interpreted as a continuous value, with values diverging more from 1 supporting stronger conclusions. This panel makes two points. First, there is no simple equivalence between p and BF that holds over all sample sizes. This is because in a frequentist t-test, the observed effect size (d) sufficient to generate a specific p value decreases with \(\sqrt {\mathrm{n}}\) more rapidly than for the BF. As a result, at large n, very small effect sizes generate ‘significant’ t-test: at n=1000, the critical t-value for a one-tailed P=0.05 is 1.65, corresponding to d=1.65 /\(\sqrt {\mathrm{n}}\) =0.05. For the BF, such a minuscule effect is 4 times more likely under H₀ than H₊ (BF₊₀=0.26). Hence, for small sample sizes p and BF support similar conclusions (e.g., P=0.05 at n=4 corresponds to BF₊₀>3, supporting the same conclusion of evidence for an effect), but for large sample sizes the frequentist and Bayesian conclusions can diverge in the presence of very small effect sizes (e.g., P=0.05 at n=1000 corresponds to BF₊₀<1/3, see Jeffreys, H. Some Tests of Significance, Treated by the Theory of Probability. Proc. Cambridge Philos. Soc. 31, 203–222 (1935)). Considering confidence or credible intervals of the effect size in addition to p or BF values helps interpret such cases. Second, the fact that the dashed lines are above the curve of the same color for all n>4 shows that BF₊₀=3 and BF₊₀=10 indeed protect against Type I errors in a frequentist sense at least at P=0.05 or P=0.01, respectively. In other words, if BF₁₀>3, p<0.05, and if BF₁₀>10, p<0.01, but how much lower than 0.05 or 0.01 the exact P value is, depends on n. b, BF₊₀ (left) and p (right) values as a function of measured effect- and sample-sizes. These panels illustrate the measured effect sizes necessary to provide evidence for an effect at different sample sizes in a one-sample one-tailed t-test using the BF vs. traditional p values. Each curve connects the results at different sample sizes for the specified value of d. The logarithmic BF and p scales are aligned so as to place BF=3 next to P=0.05, and BF=10 next to P=0.01.

Extended Data Fig. 2 Evidence for or against a factor in a Bayesian ANOVA.

A Bayesian ANOVA is a form of model comparison. This figure illustrates how the Bayes factor can provide evidence for a simpler model by concentrating its predictions on a single parameter value. This example ANOVA determines whether or not the data D depend on the value of the factor Group by comparing the Null Model D=0*Group (left) against the Group Model D=β*Group, with a Cauchy prior on β (right). The top row illustrates the prior probability attributed to the different values of β under the two competing models. Note how both models include β = 0 as a possibility, but given that the probability values must integrate to 1 over the entire β space, for the Null Model p(β = 0) = 1 while for the Group Model, the probability is distributed across all plausible alternative values. The middle row shows the predicted t-values based on these priors, where t represents the difference between the data from the two groups as in Fig. 2. Note how these predictions are more peaked for the Null compared to the Group model. The bottom row compares the predicted probability of finding particular t-values under the two models, and shows how values close to zero (i.e., small or no difference between the groups) are predicted more often by the Null compared to the Group Model, while the opposite is true for large t-values. If conducting the experiment reveals a measured t-values close to zero, the Bayes Factor for including the factor Group would be substantially below 1, providing evidence for the absence of an effect of Group, while the inverse would be true for high t-values.

Extended Data Fig. 3

Examples of how to report results.

Supplementary information

Supplementary Information

Supplementary Note on continuous testing and Supplementary Fig. 1.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Keysers, C., Gazzola, V. & Wagenmakers, EJ. Using Bayes factor hypothesis testing in neuroscience to establish evidence of absence. Nat Neurosci 23, 788–799 (2020). https://doi.org/10.1038/s41593-020-0660-4

Download citation

Received: 11 December 2019
Accepted: 21 May 2020
Published: 29 June 2020
Issue Date: July 2020
DOI: https://doi.org/10.1038/s41593-020-0660-4
Springer Nature America, Inc.

This article is cited by

Transcranial focused ultrasound to V5 enhances human visual motion brain-computer interface by modulating feature-based attention
- Joshua Kosnoff
- Kai Yu
- Bin He
Nature Communications (2024)
Cognitive control training with domain-general response inhibition does not change children’s brains or behavior
- Keertana Ganesan
- Abigail Thompson
- Nikolaus Steinbeis
Nature Neuroscience (2024)
Identifying and overcoming COVID-19 vaccination impediments using Bayesian data mining techniques
- Bowen Lei
- Arvind Mahajan
- Bani Mallick
Scientific Reports (2024)
Neural mechanisms of costly helping in the general population and mirror-pain synesthetes
- Kalliopi Ioumpa
- Selene Gallo
- Valeria Gazzola
Scientific Reports (2024)
Electrophysiological correlates of sustained conscious perception
- Annika Hense
- Antje Peters
- Thomas Straube
Scientific Reports (2024)

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Using Bayes factor hypothesis testing in neuroscience to establish evidence of absence

From

Abstract

Access this article

Similar content being viewed by others

Four Neuroimaging Questions that P-Values Cannot Answer (and Bayesian Analysis Can)

Within-subject mediation analysis for experimental data in cognitive psychology and neuroscience

Plausible Reasoning in Neuroscience

Data availability

Change history

06 October 2020

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Extended data

Extended Data Fig. 1 The relationship between BF, p, and effect sizes values.

Extended Data Fig. 2 Evidence for or against a factor in a Bayesian ANOVA.

Extended Data Fig. 3

Supplementary information

Supplementary Information

Rights and permissions

About this article

Cite this article

This article is cited by

Transcranial focused ultrasound to V5 enhances human visual motion brain-computer interface by modulating feature-based attention

Cognitive control training with domain-general response inhibition does not change children’s brains or behavior

Identifying and overcoming COVID-19 vaccination impediments using Bayesian data mining techniques

Neural mechanisms of costly helping in the general population and mirror-pain synesthetes

Electrophysiological correlates of sustained conscious perception

Navigation

Using Bayes factor hypothesis testing in neuroscience to establish evidence of absence

Abstract

Access this article

Similar content being viewed by others

Data availability

Change history

06 October 2020

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Extended data

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Navigation