The Bayesian New Statistics: Hypothesis testing, estimation, meta-analysis, and power analysis from a Bayesian perspective

Brief Report

Abstract

In the practice of data analysis, there is a conceptual distinction between hypothesis testing, on the one hand, and estimation with quantified uncertainty on the other. Among frequentists in psychology, a shift of emphasis from hypothesis testing to estimation has been dubbed “the New Statistics” (Cumming 2014). A second conceptual distinction is between frequentist methods and Bayesian methods. Our main goal in this article is to explain how Bayesian methods achieve the goals of the New Statistics better than frequentist methods. The article reviews frequentist and Bayesian approaches to hypothesis testing and to estimation with confidence or credible intervals. The article also describes Bayesian approaches to meta-analysis, randomized controlled trials, and power analysis.

Keywords

Null hypothesis significance testing Bayesian inference Bayes factor Confidence interval Credible interval Highest density interval Region of practical equivalence Meta-analysis Power analysis Effect size Randomized controlled trial Equivalence testing 

References

  1. Allenby, G.M., Bakken, D.G., & Rossi, P.E. (2004). The hierarchical Bayesian revolution: How Bayesian methods have changed the face of marketing research. Marketing Research, 16, 20–25.Google Scholar
  2. Anderson, D.R., Burnham, K.P., & Thompson, W.L. (2000). Null hypothesis testing: Problems, prevalence, and an alternative. The Journal of Wildlife Management, 64(4), 912–923.CrossRefGoogle Scholar
  3. Beaumont, M.A., & Rannala, B. (2004). The Bayesian revolution in genetics. Nature Reviews Genetics.Google Scholar
  4. Berry, S.M., Carlin, B.P., Lee, J.J., & Müller, P. (2011). Bayesian adaptive methods for clinical trials. Boca Raton, FL: CRC Press.Google Scholar
  5. Brooks, S.P. (2003). Bayesian computation: A statistical revolution. Philosophical Transactions of the Royal Society of London. Series A, 361(1813), 2681–2697.CrossRefPubMedGoogle Scholar
  6. Brophy, J.M., Joseph, L., & Rouleau, J.L. (2001). β-blockers in congestive heart failure: A Bayesian meta-analysis. Annals of Internal Medicine, 134, 550–560.CrossRefPubMedGoogle Scholar
  7. Carlin, B.P., & Louis, T.A. (2009). Bayesian methods for data analysis, 3rd edn. Boca Raton, FL: CRC Press.Google Scholar
  8. Cohen, B.H. (2008). Explaining psychological statistics, 3rd edn. Hoboken, New Jersey: Wiley.Google Scholar
  9. Cohen, J. (1988). Statistical power analysis for the behavioral sciences, 2nd edn. Hillsdale, NJ: Erlbaum.Google Scholar
  10. Cohen, J. (1994). The world is round (p<:05), American Psychologist, 997–1003.Google Scholar
  11. Cox, D.R. (2006). Principles of statistical inference. Cambridge, UK: Cambridge University Press.CrossRefGoogle Scholar
  12. Cumming, G. (2007). Inference by eye: Pictures of confidence intervals and thinking about levels of confidence. Teaching Statistics, 29(3), 89–93.CrossRefGoogle Scholar
  13. Cumming, G. (2014). The new statistics why and how. Psychological Science, 25(1), 7–29.CrossRefPubMedGoogle Scholar
  14. Cumming, G., & Fidler, F. (2009). Confidence intervals: Better answers to better questions. Zeitschrift für Psychologie / Journal of Psychology, 217(1), 15–26.CrossRefGoogle Scholar
  15. Cumming, G., & Finch, S. (2001). A primer on the understanding, use and calculation of confidence intervals based on central and noncentral distributions. Educational and Psychological Measurement, 61, 530–572.CrossRefGoogle Scholar
  16. Dienes, Z. (2014). Using Bayes to get the most out of non-significant results. Frontiers in Psychology, 5, 781.CrossRefPubMedPubMedCentralGoogle Scholar
  17. Dienes, Z. (2016). How Bayes factors change scientific practice. Journal of Mathematical Psychology, 72, 78–89. doi:http://dx.doi.org/10.1016/j.jmp.2015.10.003.CrossRefGoogle Scholar
  18. Doyle, A.C. (1890). The sign of four. London: Spencer Blackett.Google Scholar
  19. Edwards, W., Lindman, H., & Savage, L.J. (1963). Bayesian statistical inference for psychological research. Psychological Review, 70, 193–242.CrossRefGoogle Scholar
  20. Freedman, L.S., Lowe, D., & Macaskill, P. (1984). Stopping rules for clinical trials incorporating clinical opinion. Biometrics, 40, 575–586.CrossRefPubMedGoogle Scholar
  21. Gallistel, C.R. (2009). The importance of proving the null. Psychological Review, 116(2), 439–453.CrossRefPubMedPubMedCentralGoogle Scholar
  22. Gelman, A., Carlin, J.B., Stern, H.S., Dunson, D.B., Vehtari, A., & Rubin, D.B. (2013). Bayesian data analysis third edition,3, Boca Raton, Florida, CRC Press.Google Scholar
  23. Gigerenzer, G., & Marewski, J.N. (2015). Surrogate science: The idol of a universal method for scientific inference. Journal of Management, 41(2), 421–440.CrossRefGoogle Scholar
  24. Greenland, S., Senn, S.J., Rothman, K.J., Carlin, J.B., Poole, C., Goodman, S.N., & Altman, D.G. (2016). Statistical tests, p values, confidence intervals, and power: A guide to misinterpretations. The American Statistician. Retrieved from 10.1080/00031305.2016.1154108#tabModule.
  25. Gregory, P.C. (2001). A Bayesian revolution in spectral analysis, AIP (American Institute of Physics) Conference Proceedings, 568, 557. Retrieved from http://dx.doi.org/10.1063/1.1381917.
  26. Hartung, J., Knapp, G., & Sinha, B.K. (2008). Bayesian meta-analysis, Statistical Meta-analysis with Applications, 155–170, Hoboken, NJ, Wiley.Google Scholar
  27. Hobbs, B.P., & Carlin, B.P. (2008). Practical Bayesian design and analysis for drug and device clinical trials. Journal of Biopharmaceutical Statistics, 18(1), 54–80.CrossRefPubMedGoogle Scholar
  28. Howell, D.C. (2013). Statistical methods for psychology 8th edition, 8th edn. Wadsworth / Cengage Learning: Belmont, CA.Google Scholar
  29. Howson, C., & Urbach, P. (2006). Scientific reasoning: The Bayesian approach, 3rd edn. Open Court: Chicago.Google Scholar
  30. Jeffreys, H. (1961). Theory of probability Oxford. UK: Oxford University Press.Google Scholar
  31. Johnson, D.H. (1995). Statistical sirens: The allure of nonparametrics. Ecology, 76, 1998–2000.CrossRefGoogle Scholar
  32. Johnson, D.H. (1999). The insignificance of statistical significance testing. Journal of Wildlife Management, 63, 763–772.CrossRefGoogle Scholar
  33. Kass, R.E., & Raftery, A.E. (1995). Bayes factors. Journal of the American Statistical Association, 90, 773–795.CrossRefGoogle Scholar
  34. Kelley, K. (2013). Effect size and sample size planning. In Little, T. D. (Ed.) Oxford Handbook of Quantitative Methods (Vols. Volume 1, Foundations, pp. 206–222). New York: Oxford University Press.Google Scholar
  35. Kline, R.B. (2004). Beyond significance testing: Reforming data analysis methods in behavioral research. Washington, DC: American Psychological Association.CrossRefGoogle Scholar
  36. Kruschke, J.K. (2011a). Bayesian assessment of null values via parameter estimation and model comparison. Perspectives on Psychological Science, 6(3), 299–312.CrossRefPubMedGoogle Scholar
  37. Kruschke, J.K. (2011b). Doing Bayesian data analysis: A tutorial with R and BUGS. Burlington, MA: Academic Press / Elsevier.Google Scholar
  38. Kruschke, J.K. (2013). Bayesian estimation supersedes the t test. Journal of Experimental Psychology: General, 142(2), 573–603. http://dx.doi.org/10.1037/a0029146.CrossRefGoogle Scholar
  39. Kruschke, J.K. (2015). Doing Bayesian data analysis, Second Edition: A tutorial with R, JAGS, and Stan. Burlington, MA: Academic Press / Elsevier.Google Scholar
  40. Kruschke, J.K., Aguinis, H., & Joo, H. (2012). The time has come: Bayesian methods for data analysis in the organizational sciences. Organizational Research Methods, 15, 722–752. http://dx.doi.org/10.1177/1094428112457829.CrossRefGoogle Scholar
  41. Kruschke, J.K., & Liddell, T.M. (2015). Bayesian data analysis for newcomers. (in preparation).Google Scholar
  42. Kruschke, J.K., & Vanpaemel, W. (2015). Bayesian estimation in hierarchical models. In Busemeyer, J. R., Townsend, J. T., Wang, Z. J., & Eidels, A. (Eds.) Oxford Handbook of Computational and Mathematical Psychology: Oxford University Press.Google Scholar
  43. Lakens, D. (2014). Performing high-powered studies efficiently with sequential analyses. European Journal of Social Psychology, 44(7), 701–710.CrossRefGoogle Scholar
  44. Lazarus, R.S., & Eriksen, C.W. (1952). Effects of failure stress upon skilled performance. Journal of Experimental Psychology, 43(2), 100–105. http://dx.doi.org/10.1037/h0056614.CrossRefPubMedGoogle Scholar
  45. Lee, M.D., & Wagenmakers, E. -J. (2014). Bayesian cognitive modeling: A practical course, Cambridge, England, Cambridge University Press.Google Scholar
  46. Lesaffre, E. (2008). Superiority, equivalence, and non-inferiority trials. Bulletin of the NYU Hospital for Joint Diseases, 66(2), 150–154.PubMedGoogle Scholar
  47. Liddell, T.M., & Kruschke, J.K. (2014). Ostracism and fines in a public goods game with accidental contributions: The importance of punishment type. Judgment and Decision Making, 9(6), 523– 547.Google Scholar
  48. Lindley, D.V. (1975). The future of statistics: A Bayesian 21st century. Advances in Applied Probability, 7, 106–115.CrossRefGoogle Scholar
  49. Lunn, D., Jackson, C., Best, N., Thomas, A., & Spiegelhalter, D. (2013). The BUGS book: A practical introduction to Bayesian analysis. Boca Raton, Florida: CRC Press.Google Scholar
  50. Maxwell, S.E., & Delaney, H.D. (2004). Designing experiments and analyzing data: A model comparison perspective, 2nd edn. Mahwah, NJ: Erlbaum.Google Scholar
  51. Maxwell, S.E., Kelley, K., & Rausch, J.R. (2008). Sample size planning for statistical power and accuracy in parameter estimation. Annual Review of Psychology, 59, 537–563.CrossRefPubMedGoogle Scholar
  52. Mayo, D.G. (2016). Don’t throw out the error control baby with the bad statistics bathwater: A commentary. The American Statistician. Retrieved from 10.1080/00031305.2016.1154108#tabModule.
  53. Mayo, D.G., & Spanos, A. (2011). Error statistics. In Bandyopadhyay, P.S., & Forster, M.R. (Eds.) Handbook of the Philosophy of Science. Volume 7: Philosophy of Statistics (pp. 153–198): Elsevier.Google Scholar
  54. McGrayne, S.B. (2011). The theory that would not die, Yale University Press.Google Scholar
  55. Meehl, P.E. (1967). Theory-testing in psychology and physics: A methodological paradox. Philosophy of Science, 34, 103–115.CrossRefGoogle Scholar
  56. Meehl, P.E. (1978). Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow progress of soft psychology. Journal of consulting and clinical Psychology, 46(4), 806.CrossRefGoogle Scholar
  57. Meehl, P.E. (1997). The problem is epistemology, not statistics: Replace significance tests by confidence intervals and quantify accuracy of risky numerical predictions, What if there Were no Significance Tests, 395–425, Mahwah, NJ, Erlbaum Harlow, L. L., Mulaik, S. A., & Steiger, J. H. (Eds.)Google Scholar
  58. Morey, R.D., Rouder, J.N., & Jamil, T. (2015). BayesFactor package for R. http://cran.r-project.org/web/packages/BayesFactor/index.html.
  59. Ntzoufras, I. (2009). Bayesian modeling using WinBUGHoboken S. NJ: Wiley.CrossRefGoogle Scholar
  60. Pitchforth, J.O., & Mengersen, K.L. (2013). Bayesian meta-analysis, Case Studies in Bayesian Statistical Modelling and Analysis Alston, C.L., Mengersen, K.L., & Pettitt, A.N. (Eds.), Wiley.Google Scholar
  61. Plummer, M. (2003). JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling, Proceedings of the 3rd International Workshop on Distributed Statistical Computing (dsc 2003), Vienna, Austria, ISSN 1609-395X.Google Scholar
  62. Plummer, M. (2012). JAGS version 3.3.0 user manual [Computer software manual].Google Scholar
  63. Poole, C. (1987). Beyond the confidence interval. American Journal of Public Health, 77(2), 195–199.CrossRefPubMedPubMedCentralGoogle Scholar
  64. Rogers, J.L., Howard, K.I., & Vessey, J.T. (1993). Using significance tests to evaluate equivalence between two experimental groups. Psychological Bulletin, 113(3), 553–565.CrossRefPubMedGoogle Scholar
  65. Rosenthal, R. (1979). The “file drawer problem”? and tolerance for null results. Psychological Bulletin, 86(3), 638–641.CrossRefGoogle Scholar
  66. Rothman, K.J. (2016). Disengaging from statistical significance. The American Statistician. Retrieved from 10.1080/00031305.2016.1154108#tabModule .
  67. Rouder, J.N., & Morey, R.D. (2011). A Bayes factor meta-analysis of Bem’s ESP claim. Psychonomic Bulletin and Review, 18, 682–689.CrossRefPubMedGoogle Scholar
  68. Rouder, J.N., Morey, R.D., & Province, J.M. (2013). A Bayes factor meta-analysis of recent extrasensory perception experiments: Comment on Storm, Tressoldi, and Di Risio (2010). Psychological Bulletin, 139(1), 241–247.CrossRefPubMedGoogle Scholar
  69. Rouder, J.N., Speckman, P.L., Sun, D., Morey, R.D., & Iverson, G. (2009). Bayesian t-tests for accepting and rejecting the null hypothesis. Psychonomic Bulletin and Review, 16, 225–237.CrossRefPubMedGoogle Scholar
  70. Sagarin, B.J., Ambler, J.K., & Lee, E.M. (2014). An ethical approach to peeking at data. Perspectives on Psychological Science, 9(3), 293–304.CrossRefPubMedGoogle Scholar
  71. Savage, I.R. (1957). Nonparametric statistics. Journal of the American Statistical Association, 52, 331–344.CrossRefGoogle Scholar
  72. Schmidt, F.L. (1996). Statistical significance testing and cumulative knowledge in psychology: Implications for training of researchers. Psychological Methods, 1(2), 115–129.CrossRefGoogle Scholar
  73. Schönbrodt, F. D., Wagenmakers, E.-J., Zehetleitner, M., & Perugini, M. (2016). Sequential hypothesis testing with Bayes factors: Efficiently testing mean differences, Psychological Methods. http://dx.doi.org/10.1037/met0000061.
  74. Schuirmann, D.J. (1987). A comparison of the two one-sided tests procedure and the power approach for assessing the equivalence of average bioavailability. Journal of Pharmacokinetics and Biopharmaceutics, 15(6), 657–680.CrossRefPubMedGoogle Scholar
  75. Schweder, T., & Hjort, N.L. (2002). Confidence and likelihood. Scandinavian Journal of Statistics, 29, 309–332.CrossRefGoogle Scholar
  76. Serlin, R.C., & Lapsley, D.K. (1985). Rationality in psychological research: The good-enough principle. American Psychologist, 40(1), 73–83.CrossRefGoogle Scholar
  77. Serlin, R.C., & Lapsley, D.K. (1993) In Keren, G., & Lewis, C. (Eds.), Rational appraisal of psychological research and the good enough principle, (pp. 199–228). Hillsdale, NJ: Erlbaum.Google Scholar
  78. Singh, K., Xie, M., & Strawderman, W.E. (2007). Confidence distribution (CD) distribution estimator of a parameter, Complex Datasets and Inverse Problems, 54, 132–150, Beachwood, OH, Institute of Mathematical Statistics Liu, R., & et al. (Eds.)Google Scholar
  79. Spiegelhalter, D.J., Freedman, L.S., & Parmar, M.K.B. (1994). Bayesian approaches to randomized trials. Journal of the Royal Statistical Society. Series A, 157, 357–416.CrossRefGoogle Scholar
  80. Stan Development Team (2012). Stan: A C++ library for probability and sampling, version 1.1. Retrieved from http://mc-stan.org/citations.html.
  81. Sullivan, K.M., & Foster, D.A. (1990). Use of the confidence interval function. Epidemiology, 1(1), 39–42.CrossRefPubMedGoogle Scholar
  82. Sutton, A.J., & Abrams, K.R. (2001). Bayesian methods in meta-analysis and evidence synthesis. Statistical Methods in Medical Research, 10(4), 277–303.CrossRefPubMedGoogle Scholar
  83. Trafimow, D., & Marks, M. (2015). Editorial. Basic and Applied Social Psychology, 37, 1–2.CrossRefGoogle Scholar
  84. Vanpaemel, W., & Lee, M.D. (2012). Using priors to formalize theory: Optimal attention and the generalized context model. Psychonomic Bulletin and Review, 19, 1047–1056.CrossRefPubMedGoogle Scholar
  85. Wagenmakers, E.-J. (2007). A practical solution to the pervasive problems of p values. Psychonomic Bulletin and Review, 14(5), 779–804.CrossRefPubMedGoogle Scholar
  86. Wagenmakers, E.-J., Lodewyckx, T., Kuriyal, H., & Grasman, R. (2010). Bayesian hypothesis testing for psychologists: A tutorial on the Savage–Dickey method. Cognitive Psychology, 60, 158–189.CrossRefPubMedGoogle Scholar
  87. Wasserstein, R.L., & Lazar, N.A. (2016). The ASA’s statement on p-values: Context, process, and purpose. The American Statistician, 70(2), 129–133. http://dx.doi.org/10.1080/00031305.2016.1154108.CrossRefGoogle Scholar
  88. Wellek, S. (2010). Testing statistical hypotheses of equivalence and noninferiority, 2nd edn. Boca Raton: Chapman and Hall/CRC Press.CrossRefGoogle Scholar
  89. Westlake, W.J. (1976). Symmetrical confidence intervals for bioequivalence trials. Biometrics, 32, 741–744.CrossRefPubMedGoogle Scholar
  90. Westlake, W.J. (1981). Response to bioequivalence testing −− a need to rethink. Biometrics, 37, 591–593.Google Scholar
  91. Wetzels, R., Matzke, D., Lee, M.D., Rouder, J., Iverson, G., & Wagenmakers, E.-J. (2011). Statistical evidence in experimental psychology: An empirical comparison using 855 t tests. Perspectives on Psychological Science, 6(3), 291–298.CrossRefPubMedGoogle Scholar
  92. Wetzels, R., Raaijmakers, J.G.W., Jakab, E., & Wagenmakers, E. -J. (2009). How to quantify support for and against the null hypothesis: A flexible WinBUGS implementation of a default Bayesian t test. Psychonomic Bulletin and Review, 16(4), 752–760.CrossRefPubMedGoogle Scholar
  93. Woodworth, G. (2004). Biostatistics: A Bayesian introduction, Wiley.Google Scholar
  94. Yusuf, S., Peto, R., Lewis, J., Collins, R., & Sleight, P. (1985). Beta blockade during and after myocardial infarction: An overview of the randomized trials. Progress in Cardiovascular Diseases, 27(5), 335–371.CrossRefPubMedGoogle Scholar

Copyright information

© Psychonomic Society, Inc. 2017

Authors and Affiliations

  1. 1.Indiana UniversityBloomingtonUSA

Personalised recommendations