Beyond p values: utilizing multiple methods to evaluate evidence

Valentine, K. D.; Buchanan, Erin M.; Scofield, John E.; Beauchamp, Marshall T.

doi:10.1007/s41237-019-00078-4

Beyond p values: utilizing multiple methods to evaluate evidence

Original Paper
Published: 08 March 2019

Volume 46, pages 121–144, (2019)
Cite this article

Behaviormetrika Aims and scope Submit manuscript

K. D. Valentine ORCID: orcid.org/0000-0001-6349-5395¹,
Erin M. Buchanan²,
John E. Scofield¹ &
…
Marshall T. Beauchamp³

914 Accesses
10 Citations
6 Altmetric
Explore all metrics

Abstract

Null hypothesis significance testing is cited as a threat to validity and reproducibility. While many individuals suggest that we focus on altering the p value at which we deem an effect significant, we believe this suggestion is short-sighted. Alternative procedures (i.e., Bayesian analyses and observation-oriented modeling: OOM) can be more powerful and meaningful to our discipline. However, these methodologies are less frequently utilized and are rarely discussed in combination with NHST. Herein, we discuss three methodologies (NHST, Bayesian Model comparison, and OOM), then compare the possible interpretations of three analyses (ANOVA, Bayes Factor, and an Ordinal Pattern Analysis) in various data environments using a frequentist simulation study. We found that changing significance thresholds had little effect on conclusions. Furthermore, we suggest that evaluating multiple estimates as evidence of an effect allows for more robust and nuanced interpretations of results and implies the need to redefine evidentiary value and reporting practices.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Four reasons to prefer Bayesian analyses over significance testing

Article Open access 28 March 2017

Are P-values and Bayes factors valid measures of evidential strength?

Article 23 November 2022

The Bayesian Methodology of Sir Harold Jeffreys as a Practical Alternative to the P Value Hypothesis Test

Article Open access 22 April 2020

References

American Psychological Association (2010) Publication manual of the American Psychological Association, 6th edn. American Psychological Association, Washington, D.C
Aust F, Barth M (2017) Papaja: create APA manuscripts with R Markdown. https://github.com/crsh/papaja
Bakker M, Hartgerink CHJ, Wicherts JM, van Der Maas HLJ (2016) Researchers’ intuitions about power in psychological research. Psychol Sci 27(8):1069–1077. https://doi.org/10.1177/0956797616647519
Article Google Scholar
Bakker M, van Dijk A, Wicherts JM (2012) The rules of the game called psychological science. Perspect Psychol Sci 7(6):543–554. https://doi.org/10.1177/1745691612459060
Article Google Scholar
Bellhouse DR (2004) The reverend Thomas Bayes, FRS: a biography to celebrate the tercentenary of his birth. Stat Sci 19(1):3–43. https://doi.org/10.1214/088342304000000189
Article MathSciNet MATH Google Scholar
Benjamin DJ, Berger JO, Johannesson M, Nosek BA, Wagenmakers EJ, Berk R, Bollen KA, Brembs B, Brown L, Camerer C, Cesarini D (2018) Redefine statistical significance. Nat Human Behav 2(1):6–10
Article Google Scholar
Berger J (2006) The case for objective Bayesian analysis. Bayesian Anal 1(3):385–402. https://doi.org/10.1214/06-BA115
Article MathSciNet MATH Google Scholar
Buchanan E M, Valentine K D, Scofield J E (2017) MOTE. https://github.com/doomlab/MOTE
Cohen J (1992) A power primer. Psychol Bull 112(1):155–159. https://doi.org/10.1037/0033-2909.112.1.155
Article MathSciNet Google Scholar
Cumming G (2008) Replication and p intervals. Perspect Psychol Sci 3(4):286–300. https://doi.org/10.1111/j.1745-6924.2008.00079.x
Article MathSciNet Google Scholar
Cumming G (2014) The new statistics: why and how. Psychol Sci 25(1):7–29. https://doi.org/10.1177/0956797613504966
Article Google Scholar
Datta G, Ghosh M (1996) On the invariance of noninformative priors. Ann Stat 24(1):141–159. https://doi.org/10.1214/aos/1033066203
Article MathSciNet MATH Google Scholar
De Laplace PS (1774) Mémoire sur les suites récurro-récurrentes et sur leurs usages dans la théorie des hasards. Acad R Sci Paris 6(8):353–371
Google Scholar
Dienes Z (2008) Understanding psychology as a science: an introduction to scientific and statistical inference. Palgrave Macmillan, Basingstoke
Google Scholar
Dienes Z (2014) Using Bayes to get the most out of non-significant results. Front Psychol 5:1–17. https://doi.org/10.3389/fpsyg.2014.00781
Article Google Scholar
Etz A, Vandekerckhove J (2016) A Bayesian perspective on the reproducibility project: psychology. PLoS ONE 11(2):1–12. https://doi.org/10.1371/journal.pone.0149794
Article Google Scholar
Fisher RA (1932) Inverse probability and the use of likelihood. Math Proc Cambridge Philos Soc 28(3):257–261. https://doi.org/10.1017/S0305004100010094
Article MATH Google Scholar
Finkel EJ, Eastwick PW, Reis HT (2015) Best research practices in psychology: illustrating epistemological and pragmatic considerations with the case of relationship science. J Personal Soc Psychol 108(2):275–297. https://doi.org/10.1037/pspi0000007
Article Google Scholar
Gelman A, Carlin JB, Stern HS, Rubin DR (2013) Bayesian data analysis. Chapman & Hall/CRC, New York
MATH Google Scholar
Genz A, Bretz F, Miwa T, Mi X, Leisch F, Scheipl F, Hothorn T (2017) mvtnorm: multivariate normal and t distributions. http://cran.r-project.org/package=mvtnorm
Gigerenzer G (2004) Mindless statistics. J Socio Econ 33(5):587–606. https://doi.org/10.1016/j.socec.2004.09.033
Article Google Scholar
Gigerenzer G, Krauss S, Vitouch O (2004) The null ritual: what you always wanted to know about significance testing but were afraid to ask. In The sage handbook of quantitative methodology for the social sciences (pp. 392–409). Thousand Oaks: SAGE Publications, Inc. https://doi.org/10.4135/9781412986311.n21
Goodman SN (1999) Toward evidence-based medical statistics. 1: the p value fallacy. Ann Intern Med https://doi.org/10.7326/0003-4819-130-12-199906150-00008
Article Google Scholar
Grice JW (2011) Observation oriented modeling: analysis of cause in the behavioral sciences. Elsevier/Academic Press, New York
Google Scholar
Grice JW (2014) Observation oriented modeling: preparing students for research in the 21st century. Compr Psychol https://doi.org/10.2466/05.08.IT.3.3
Article Google Scholar
Grice JW, Barrett PT, Schlimgen LA, Abramson CI (2012) Toward a brighter future for psychology as an observation oriented science. Behav Sci 2(4):1–22. https://doi.org/10.3390/bs2010001
Article Google Scholar
Grice J, Barrett P, Cota L, Felix C, Taylor Z, Garner S, Medellin E, Vest A (2017) Four bad habits of modern psychologists. Behav Sci 7(3):53
Article Google Scholar
Grice JW, Craig DPA, Abramson CI (2015) A simple and transparent alternative to repeated measures ANOVA. SAGE Open 5(3):2158244015604192. https://doi.org/10.1177/2158244015604192
Article Google Scholar
Haaf JM, Rouder JN (2017) Developing constraint in bayesian mixed models. Psychol Methods 22(4):779–798. https://doi.org/10.1037/met0000156
Article Google Scholar
Ioannidis JPA (2005) Why most published research findings are false. PLoS Med 2(8):e124. https://doi.org/10.1371/journal.pmed.0020124
Article Google Scholar
JASP Team (2017) JASP. https://jasp-stats.org/
Kass RE, Raftery AE (1995) Bayes Factors. J Am Stat Assoc 90(430):773–795. https://doi.org/10.2307/2291091
Article MathSciNet MATH Google Scholar
Klugkist I, Hoijtink H (2007) The Bayes factor for inequality and about equality constrained models. Comput Stat Data Anal 51(12):6367–6379. https://doi.org/10.1016/j.csda.2007.01.024
Article MathSciNet MATH Google Scholar
Kruschke JK (2014) Doing Bayesian data analysis: a tutorial with R, JAGS, and Stan, 2nd edn. Academic Press, Cambridge
MATH Google Scholar
Lakens D (2013) Calculating and reporting effect sizes to facilitate cumulative science: a practical primer for t-tests and ANOVAs. Front Psychol https://doi.org/10.3389/fpsyg.2013.00863
Article Google Scholar
Lakens D (2017) Equivalence tests. Social Psychol Person Sci 8(4):355–362. https://doi.org/10.1177/1948550617697177
Article Google Scholar
Lakens D, Adolfi FG, Albers CJ, Anvari F, Apps MAJ, Argamon SE, Baguley T, Becker RB, Benning SD, Bradford DE, Buchanan EM (2018) Justify your alpha. Nat Human Behav 2(3):168–171. https://doi.org/10.1038/s41562-018-0311-x
Article Google Scholar
Lawrence M A (2017) ez: Easy analysis and visualization of factorial experiments. http://cran.r-project.org/package=ez
Lehmann EL (1993) The Fisher, Neyman-Pearson theories of testing hypotheses: one theory or two? J Am Stat Assoc 88(424):1242–1249. https://doi.org/10.1080/01621459.1993.10476404
Article MathSciNet MATH Google Scholar
Lehmann EL (2011) Fisher, Neyman, and the creation of classical statistics. Springer, New York
Book MATH Google Scholar
Lindsay DS (2015) Replication in psychological science. Psychol Sci 26(12):1827–1832. https://doi.org/10.1177/0956797615616374
Article Google Scholar
Maxwell SE, Delaney HD (2004) Designing experiments and analyzing data: a model comparison perspective, 2nd edn. Lawrence Erlbaum Association, Mahwah
MATH Google Scholar
Maxwell SE, Lau MY, Howard GS (2015) Is psychology suffering from a replication crisis? What does “failure to replicate” really mean? Am Psychol 70(6):487–498. https://doi.org/10.1037/a0039400
Article Google Scholar
Morey R D (2015) On verbal categories for the interpretation of Bayes factors. http://bayesfactor.blogspot.com/2015/01/on-verbal-categories-for-interpretation.html
Morey R D, Rouder J N (2015) BayesFactor: computation of Bayes Factors for common designs. https://cran.r-project.org/package=BayesFactor
Nosek BA, Lakens D (2014) Registered reports. Soc Psychol 45(3):137–141. https://doi.org/10.1027/1864-9335/a000192
Article Google Scholar
Nosek BA, Spies JR, Motyl M (2012) Scientific utopia. Perspect Psychol Sci 7(6):615–631. https://doi.org/10.1177/1745691612459058
Article Google Scholar
Open Science Collaboration (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716. https://doi.org/10.1126/science.aac4716
Article Google Scholar
Pericchi L, Pereira C (2016) Adaptive significance levels using optimal decision rules: balancing by weighting the error probabilities. Braz J Prob Stat 30(1):70–90. https://doi.org/10.1214/14-BJPS257
Article MATH Google Scholar
Press SJ (2002) Subjective and objective Bayesian statistics. John Wiley & Sons, Inc., Hoboken. https://doi.org/10.1002/9780470317105
Book Google Scholar
Rosnow RL, Rosenthal R (1989) Statistical procedures and the justification of knowledge in psychological science. Am Psychol 44(10):1276–1284. https://doi.org/10.1037/0003-066X.44.10.1276
Article Google Scholar
Rouder JN, Morey RD, Speckman PL, Province JM (2012) Default Bayes factors for ANOVA designs. J Math Psychol 56(5):356–374. https://doi.org/10.1016/j.jmp.2012.08.001
Article MathSciNet MATH Google Scholar
Rouder JN, Speckman PL, Sun D, Morey RD, Iverson G (2009) Bayesian t tests for accepting and rejecting the null hypothesis. Psychon Bull Rev 16(2):225–237. https://doi.org/10.3758/PBR.16.2.225
Article Google Scholar
Sauer S, Luebke K (2017) Observation oriented modeling revised from a statistical point of view. https://doi.org/10.17605/OSF.IO/3J4XR
Sellke T, Bayarri MJ, Berger JO (2001) Calibration of p values for testing precise null hypotheses. Am Stat 55(1):62–71. https://doi.org/10.1198/000313001300339950
Article MathSciNet MATH Google Scholar
Simmons JP, Nelson LD, Simonsohn U (2011) False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychol Sci 22(11):1359–1366. https://doi.org/10.1177/0956797611417632
Article Google Scholar
Tabachnick BG, Fidell LS (2012) Using multivariate statistics, Sixth edn. Pearson, Boston
Google Scholar
Trafimow D, Amrhein V, Areshenkoff CN, Barrera-Causil CJ, Beh EJ, Bilgic YK, Bono R, Bradley MT, Briggs WM, Cepeda-Freyre HA, Chaigneau SE (2018) Manipulating the alpha level cannot cure significance testing. Front Psychol https://doi.org/10.3389/fpsyg.2018.00699
Article Google Scholar
Valentine KD, Buchanan EM (2013) JAM-boree: an application of observation oriented modelling to judgements of associative memory. J Cognit Psychol 25(4):400–422. https://doi.org/10.1080/20445911.2013.775120
Article Google Scholar
van Elk M, Matzke D, Gronau QF, Guan M, Vandekerckhove J, Wagenmakers E-J (2015) Meta-analyses are no substitute for registered replications: a skeptical perspective on religious priming. Front Psychol 6:1365. https://doi.org/10.3389/fpsyg.2015.01365
Article Google Scholar
van’t Veer AE, Giner-Sorolla R (2016) Pre-registration in social psychology—a discussion and suggested template. J Exp Soc Psychol 67:2–12. https://doi.org/10.1016/j.jesp.2016.03.004
Article Google Scholar
Wagenmakers E-J (2007) A practical solution to the pervasive problems of p values. Psychon Bull Rev 14(5):779–804. https://doi.org/10.3758/BF03194105
Article Google Scholar
Wasserstein RL, Lazar NA (2016) The ASA’s statement on p -values: context, process, and purpose. Am Stat 70(2):129–133. https://doi.org/10.1080/00031305.2016.1154108
Article MathSciNet Google Scholar
Wetzels R, Matzke D, Lee MD, Rouder JN, Iverson GJ, Wagenmakers E-J (2011) Statistical evidence in experimental psychology. Perspect Psychol Sci 6(3):291–298. https://doi.org/10.1177/1745691611406923
Article Google Scholar

Download references

Author information

Authors and Affiliations

University of Missouri, 210 McAlester Ave, Columbia, MO, 65211, USA
K. D. Valentine & John E. Scofield
Harrisburg University of Science and Technology, Harrisburg, USA
Erin M. Buchanan
University of Missouri, Kansas City, USA
Marshall T. Beauchamp

Authors

K. D. Valentine
View author publications
You can also search for this author in PubMed Google Scholar
Erin M. Buchanan
View author publications
You can also search for this author in PubMed Google Scholar
John E. Scofield
View author publications
You can also search for this author in PubMed Google Scholar
Marshall T. Beauchamp
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to K. D. Valentine.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Communicated by Kensuke Okada.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

About this article

Cite this article

Valentine, K.D., Buchanan, E.M., Scofield, J.E. et al. Beyond p values: utilizing multiple methods to evaluate evidence. Behaviormetrika 46, 121–144 (2019). https://doi.org/10.1007/s41237-019-00078-4

Download citation

Received: 17 August 2018
Accepted: 25 February 2019
Published: 08 March 2019
Issue Date: 01 April 2019
DOI: https://doi.org/10.1007/s41237-019-00078-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Beyond p values: utilizing multiple methods to evaluate evidence

Abstract

Access this article

Similar content being viewed by others

Four reasons to prefer Bayesian analyses over significance testing

Are P-values and Bayes factors valid measures of evidential strength?

The Bayesian Methodology of Sir Harold Jeffreys as a Practical Alternative to the P Value Hypothesis Test

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s Note

About this article

Cite this article

Keywords

Navigation

Beyond p values: utilizing multiple methods to evaluate evidence

Abstract

Access this article

Similar content being viewed by others

Four reasons to prefer Bayesian analyses over significance testing

Are P-values and Bayes factors valid measures of evidential strength?

The Bayesian Methodology of Sir Harold Jeffreys as a Practical Alternative to the P Value Hypothesis Test

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s Note

About this article

Cite this article

Share this article

Keywords

Search

Navigation