, Volume 46, Issue 1, pp 121–144 | Cite as

Beyond p values: utilizing multiple methods to evaluate evidence

  • K. D. ValentineEmail author
  • Erin M. Buchanan
  • John E. Scofield
  • Marshall T. Beauchamp
Original Paper


Null hypothesis significance testing is cited as a threat to validity and reproducibility. While many individuals suggest that we focus on altering the p value at which we deem an effect significant, we believe this suggestion is short-sighted. Alternative procedures (i.e., Bayesian analyses and observation-oriented modeling: OOM) can be more powerful and meaningful to our discipline. However, these methodologies are less frequently utilized and are rarely discussed in combination with NHST. Herein, we discuss three methodologies (NHST, Bayesian Model comparison, and OOM), then compare the possible interpretations of three analyses (ANOVA, Bayes Factor, and an Ordinal Pattern Analysis) in various data environments using a frequentist simulation study. We found that changing significance thresholds had little effect on conclusions. Furthermore, we suggest that evaluating multiple estimates as evidence of an effect allows for more robust and nuanced interpretations of results and implies the need to redefine evidentiary value and reporting practices.


Null hypothesis testing p values Bayes factors Observation-oriented modeling Evidence 


Compliance with ethical standards

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.


  1. American Psychological Association (2010) Publication manual of the American Psychological Association, 6th edn. American Psychological Association, Washington, D.CGoogle Scholar
  2. Aust F, Barth M (2017) Papaja: create APA manuscripts with R Markdown.
  3. Bakker M, Hartgerink CHJ, Wicherts JM, van Der Maas HLJ (2016) Researchers’ intuitions about power in psychological research. Psychol Sci 27(8):1069–1077. Google Scholar
  4. Bakker M, van Dijk A, Wicherts JM (2012) The rules of the game called psychological science. Perspect Psychol Sci 7(6):543–554. Google Scholar
  5. Bellhouse DR (2004) The reverend Thomas Bayes, FRS: a biography to celebrate the tercentenary of his birth. Stat Sci 19(1):3–43. MathSciNetzbMATHGoogle Scholar
  6. Benjamin DJ, Berger JO, Johannesson M, Nosek BA, Wagenmakers EJ, Berk R, Bollen KA, Brembs B, Brown L, Camerer C, Cesarini D (2018) Redefine statistical significance. Nat Human Behav 2(1):6–10Google Scholar
  7. Berger J (2006) The case for objective Bayesian analysis. Bayesian Anal 1(3):385–402. MathSciNetzbMATHGoogle Scholar
  8. Buchanan E M, Valentine K D, Scofield J E (2017) MOTE.
  9. Cohen J (1992) A power primer. Psychol Bull 112(1):155–159. MathSciNetGoogle Scholar
  10. Cumming G (2008) Replication and p intervals. Perspect Psychol Sci 3(4):286–300. MathSciNetGoogle Scholar
  11. Cumming G (2014) The new statistics: why and how. Psychol Sci 25(1):7–29. Google Scholar
  12. Datta G, Ghosh M (1996) On the invariance of noninformative priors. Ann Stat 24(1):141–159. MathSciNetzbMATHGoogle Scholar
  13. De Laplace PS (1774) Mémoire sur les suites récurro-récurrentes et sur leurs usages dans la théorie des hasards. Acad R Sci Paris 6(8):353–371Google Scholar
  14. Dienes Z (2008) Understanding psychology as a science: an introduction to scientific and statistical inference. Palgrave Macmillan, BasingstokeGoogle Scholar
  15. Dienes Z (2014) Using Bayes to get the most out of non-significant results. Front Psychol 5:1–17. Google Scholar
  16. Etz A, Vandekerckhove J (2016) A Bayesian perspective on the reproducibility project: psychology. PLoS ONE 11(2):1–12. Google Scholar
  17. Fisher RA (1932) Inverse probability and the use of likelihood. Math Proc Cambridge Philos Soc 28(3):257–261. zbMATHGoogle Scholar
  18. Finkel EJ, Eastwick PW, Reis HT (2015) Best research practices in psychology: illustrating epistemological and pragmatic considerations with the case of relationship science. J Personal Soc Psychol 108(2):275–297. Google Scholar
  19. Gelman A, Carlin JB, Stern HS, Rubin DR (2013) Bayesian data analysis. Chapman & Hall/CRC, New YorkzbMATHGoogle Scholar
  20. Genz A, Bretz F, Miwa T, Mi X, Leisch F, Scheipl F, Hothorn T (2017) mvtnorm: multivariate normal and t distributions.
  21. Gigerenzer G (2004) Mindless statistics. J Socio Econ 33(5):587–606. Google Scholar
  22. Gigerenzer G, Krauss S, Vitouch O (2004) The null ritual: what you always wanted to know about significance testing but were afraid to ask. In The sage handbook of quantitative methodology for the social sciences (pp. 392–409). Thousand Oaks: SAGE Publications, Inc.
  23. Goodman SN (1999) Toward evidence-based medical statistics. 1: the p value fallacy. Ann Intern Med Google Scholar
  24. Grice JW (2011) Observation oriented modeling: analysis of cause in the behavioral sciences. Elsevier/Academic Press, New YorkGoogle Scholar
  25. Grice JW (2014) Observation oriented modeling: preparing students for research in the 21st century. Compr Psychol Google Scholar
  26. Grice JW, Barrett PT, Schlimgen LA, Abramson CI (2012) Toward a brighter future for psychology as an observation oriented science. Behav Sci 2(4):1–22. Google Scholar
  27. Grice J, Barrett P, Cota L, Felix C, Taylor Z, Garner S, Medellin E, Vest A (2017) Four bad habits of modern psychologists. Behav Sci 7(3):53Google Scholar
  28. Grice JW, Craig DPA, Abramson CI (2015) A simple and transparent alternative to repeated measures ANOVA. SAGE Open 5(3):2158244015604192. Google Scholar
  29. Haaf JM, Rouder JN (2017) Developing constraint in bayesian mixed models. Psychol Methods 22(4):779–798. Google Scholar
  30. Ioannidis JPA (2005) Why most published research findings are false. PLoS Med 2(8):e124. Google Scholar
  31. JASP Team (2017) JASP.
  32. Kass RE, Raftery AE (1995) Bayes Factors. J Am Stat Assoc 90(430):773–795. MathSciNetzbMATHGoogle Scholar
  33. Klugkist I, Hoijtink H (2007) The Bayes factor for inequality and about equality constrained models. Comput Stat Data Anal 51(12):6367–6379. MathSciNetzbMATHGoogle Scholar
  34. Kruschke JK (2014) Doing Bayesian data analysis: a tutorial with R, JAGS, and Stan, 2nd edn. Academic Press, CambridgezbMATHGoogle Scholar
  35. Lakens D (2013) Calculating and reporting effect sizes to facilitate cumulative science: a practical primer for t-tests and ANOVAs. Front Psychol Google Scholar
  36. Lakens D (2017) Equivalence tests. Social Psychol Person Sci 8(4):355–362. Google Scholar
  37. Lakens D, Adolfi FG, Albers CJ, Anvari F, Apps MAJ, Argamon SE, Baguley T, Becker RB, Benning SD, Bradford DE, Buchanan EM (2018) Justify your alpha. Nat Human Behav 2(3):168–171. Google Scholar
  38. Lawrence M A (2017) ez: Easy analysis and visualization of factorial experiments.
  39. Lehmann EL (1993) The Fisher, Neyman-Pearson theories of testing hypotheses: one theory or two? J Am Stat Assoc 88(424):1242–1249. MathSciNetzbMATHGoogle Scholar
  40. Lehmann EL (2011) Fisher, Neyman, and the creation of classical statistics. Springer, New YorkzbMATHGoogle Scholar
  41. Lindsay DS (2015) Replication in psychological science. Psychol Sci 26(12):1827–1832. Google Scholar
  42. Maxwell SE, Delaney HD (2004) Designing experiments and analyzing data: a model comparison perspective, 2nd edn. Lawrence Erlbaum Association, MahwahzbMATHGoogle Scholar
  43. Maxwell SE, Lau MY, Howard GS (2015) Is psychology suffering from a replication crisis? What does “failure to replicate” really mean? Am Psychol 70(6):487–498. Google Scholar
  44. Morey R D (2015) On verbal categories for the interpretation of Bayes factors.
  45. Morey R D, Rouder J N (2015) BayesFactor: computation of Bayes Factors for common designs.
  46. Nosek BA, Lakens D (2014) Registered reports. Soc Psychol 45(3):137–141. Google Scholar
  47. Nosek BA, Spies JR, Motyl M (2012) Scientific utopia. Perspect Psychol Sci 7(6):615–631. Google Scholar
  48. Open Science Collaboration (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716. Google Scholar
  49. Pericchi L, Pereira C (2016) Adaptive significance levels using optimal decision rules: balancing by weighting the error probabilities. Braz J Prob Stat 30(1):70–90. zbMATHGoogle Scholar
  50. Press SJ (2002) Subjective and objective Bayesian statistics. John Wiley & Sons, Inc., Hoboken. Google Scholar
  51. Rosnow RL, Rosenthal R (1989) Statistical procedures and the justification of knowledge in psychological science. Am Psychol 44(10):1276–1284. Google Scholar
  52. Rouder JN, Morey RD, Speckman PL, Province JM (2012) Default Bayes factors for ANOVA designs. J Math Psychol 56(5):356–374. MathSciNetzbMATHGoogle Scholar
  53. Rouder JN, Speckman PL, Sun D, Morey RD, Iverson G (2009) Bayesian t tests for accepting and rejecting the null hypothesis. Psychon Bull Rev 16(2):225–237. Google Scholar
  54. Sauer S, Luebke K (2017) Observation oriented modeling revised from a statistical point of view.
  55. Sellke T, Bayarri MJ, Berger JO (2001) Calibration of p values for testing precise null hypotheses. Am Stat 55(1):62–71. MathSciNetzbMATHGoogle Scholar
  56. Simmons JP, Nelson LD, Simonsohn U (2011) False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychol Sci 22(11):1359–1366. Google Scholar
  57. Tabachnick BG, Fidell LS (2012) Using multivariate statistics, Sixth edn. Pearson, BostonGoogle Scholar
  58. Trafimow D, Amrhein V, Areshenkoff CN, Barrera-Causil CJ, Beh EJ, Bilgic YK, Bono R, Bradley MT, Briggs WM, Cepeda-Freyre HA, Chaigneau SE (2018) Manipulating the alpha level cannot cure significance testing. Front Psychol Google Scholar
  59. Valentine KD, Buchanan EM (2013) JAM-boree: an application of observation oriented modelling to judgements of associative memory. J Cognit Psychol 25(4):400–422. Google Scholar
  60. van Elk M, Matzke D, Gronau QF, Guan M, Vandekerckhove J, Wagenmakers E-J (2015) Meta-analyses are no substitute for registered replications: a skeptical perspective on religious priming. Front Psychol 6:1365. Google Scholar
  61. van’t Veer AE, Giner-Sorolla R (2016) Pre-registration in social psychology—a discussion and suggested template. J Exp Soc Psychol 67:2–12. Google Scholar
  62. Wagenmakers E-J (2007) A practical solution to the pervasive problems of p values. Psychon Bull Rev 14(5):779–804. Google Scholar
  63. Wasserstein RL, Lazar NA (2016) The ASA’s statement on p -values: context, process, and purpose. Am Stat 70(2):129–133. MathSciNetGoogle Scholar
  64. Wetzels R, Matzke D, Lee MD, Rouder JN, Iverson GJ, Wagenmakers E-J (2011) Statistical evidence in experimental psychology. Perspect Psychol Sci 6(3):291–298. Google Scholar

Copyright information

© The Behaviormetric Society 2019

Authors and Affiliations

  1. 1.University of MissouriColumbiaUSA
  2. 2.Harrisburg University of Science and TechnologyHarrisburgUSA
  3. 3.University of MissouriKansas CityUSA

Personalised recommendations