Skip to main content

A practical solution to the pervasive problems ofp values

Abstract

In the field of psychology, the practice ofp value null-hypothesis testing is as widespread as ever. Despite this popularity, or perhaps because of it, most psychologists are not aware of the statistical peculiarities of thep value procedure. In particular,p values are based on data that were never observed, and these hypothetical data are themselves influenced by subjective intentions. Moreover,p values do not quantify statistical evidence. This article reviews thesep value problems and illustrates each problem with concrete examples. The three problems are familiar to statisticians but may be new to psychologists. A practical solution to thesep value problems is to adopt a model selection perspective and use the Bayesian information criterion (BIC) for statistical inference (Raftery, 1995). The BIC provides an approximation to a Bayesian hypothesis test, does not require the specification of priors, and can be easily calculated from SPSS output.

References

  1. Akaike, H. (1974). A new look at the statistical model identification.IEEE Transactions on Automatic Control,19, 716–723.

    Article  Google Scholar 

  2. Anscombe, F. J. (1954). Fixed-sample-size analysis of sequential observations.Biometrics,10, 89–100.

    Article  Google Scholar 

  3. Anscombe, F. J. (1963). Sequential medical trials.Journal of the American Statistical Association,58, 365–383.

    Article  Google Scholar 

  4. Armitage, P. (1957). Restricted sequential procedures.Biometrika,44, 9–26.

    Google Scholar 

  5. Armitage, P. (1960).Sequential medical trials. Springfield, IL: Thomas.

  6. Armitage, P., McPherson, C. K., &Rowe, B. C. (1969). Repeated significance tests on accumulating data.Journal of the Royal Statistical Society: Series A,132, 235–244.

    Article  Google Scholar 

  7. Bakan, D. (1966). The test of significance in psychological research.Psychological Bulletin,66, 423–437.

    PubMed  Article  Google Scholar 

  8. Barnard, G. A. (1947). The meaning of a significance level.Biometrika,34, 179–182.

    Google Scholar 

  9. Basu, D. (1964). Recovery of ancillary information.Sankhya: Series A,26, 3–16.

    Google Scholar 

  10. Bayarri, M.-J., &Berger, J. O. (2004). The interplay of Bayesian and frequentist analysis.Statistical Science,19, 58–80.

    Article  Google Scholar 

  11. Berger, J. O. (1985).Statistical decision theory and Bayesian analysis (2nd ed.). New York: Springer.

    Google Scholar 

  12. Berger, J. O. (2003). Could Fisher, Jeffreys and Neyman have agreed on testing?Statistical Science,18, 1–32.

    Article  Google Scholar 

  13. Berger, J. O., &Berry, D. A. (1988a). The relevance of stopping rules in statistical inference. In S. S. Gupta & J. O. Berger (Eds.),Statistical decision theory and related topics IV (Vol. 1, pp. 29–72). New York: Springer.

    Google Scholar 

  14. Berger, J. O., &Berry, D. A. (1988b). Statistical analysis and the illusion of objectivity.American Scientist,76, 159–165.

    Google Scholar 

  15. Berger, J. O., Boukai, B., &Wang, Y. (1997). Unified frequentist and Bayesian testing of a precise hypothesis (with discussion).Statistical Science,12, 133–160.

    Article  Google Scholar 

  16. Berger, J. O., Brown, L., &Wolpert, R. (1994). A unified conditional frequentist and Bayesian test for fixed and sequential hypothesis testing.Annals of Statistics,22, 1787–1807.

    Article  Google Scholar 

  17. Berger, J. O., &Delampady, M. (1987). Testing precise hypotheses.Statistical Science,2, 317–352.

    Article  Google Scholar 

  18. Berger, J. O., &Mortera, J. (1999). Default Bayes factors for nonnested hypothesis testing.Journal of the American Statistical Association,94, 542–554.

    Article  Google Scholar 

  19. Berger, J. O., &Pericchi, L. R. (1996). The intrinsic Bayes factor for model selection and prediction.Journal of the American Statistical Association,91, 109–122.

    Article  Google Scholar 

  20. Berger, J. O., &Sellke, T. (1987). Testing a point null hypothesis: The irreconcilability of p values and evidence.Journal of the American Statistical Association,82, 112–139.

    Article  Google Scholar 

  21. Berger, J. O., &Wolpert, R. L. (1988).The likelihood principle (2nd ed.). Hayward, CA: Institute of Mathematical Statistics.

    Google Scholar 

  22. Bernardo, J. M., &Smith, A. F. M. (1994).Bayesian theory. Chichester, U.K.: Wiley.

    Book  Google Scholar 

  23. Birnbaum, A. (1962). On the foundations of statistical inference (with discussion).Journal of the American Statistical Association,53, 259–326.

    Google Scholar 

  24. Birnbaum, A. (1977). The Neyman—Pearson theory as decision theory, and as inference theory; with a criticism of the Lindley—Savage argument for Bayesian theory.Synthese,36, 19–49.

    Article  Google Scholar 

  25. Box, G. E. P., &Tiao, G. C. (1973).Bayesian inference in statistical analysis. Reading, MA: Addison-Wesley.

    Google Scholar 

  26. Browne, M. (2000). Cross-validation methods.Journal of Mathematical Psychology,44, 108–132.

    PubMed  Article  Google Scholar 

  27. Burdette, W. J., &Gehan, E. A. (1970).Planning and analysis of clinical studies. Springfield, IL: Thomas.

    Google Scholar 

  28. Burnham, K. P., &Anderson, D. R. (2002).Model selection and multimodel inference: A practical information-theoretic approach (2nd ed.). New York: Springer.

    Google Scholar 

  29. Busemeyer, J. R., &Stout, J. C. (2002). A contribution of cognitive decision models to clinical assessment: Decomposing performance on the Bechara gambling task.Psychological Assessment,14, 253–262.

    PubMed  Article  Google Scholar 

  30. Christensen, R. (2005). Testing Fisher, Neyman, Pearson, and Bayes.American Statistician,59, 121–126.

    Article  Google Scholar 

  31. Cohen, J. (1994). The earth is round (p <.05).American Psychologist,49, 997–1003.

    Article  Google Scholar 

  32. Cornfield, J. (1966). Sequential trials, sequential analysis, and the likelihood principle.American Statistician,20, 18–23.

    Article  Google Scholar 

  33. Cornfield, J. (1969). The Bayesian outlook and its application.Biometrics,25, 617–657.

    PubMed  Article  Google Scholar 

  34. Cortina, J. M., &Dunlap, W. P. (1997). On the logic and purpose of significance testing.Psychological Methods,2, 161–172.

    Article  Google Scholar 

  35. Cox, D. R. (1958). Some problems connected with statistical inference.Annals of Mathematical Statistics,29, 357–372.

    Article  Google Scholar 

  36. Cox, D. R. (1971). The choice between alternative ancillary statistics.Journal of the Royal Statistical Society: Series B,33, 251–255.

    Google Scholar 

  37. Cox, R. T. (1946). Probability, frequency and reasonable expectation.American Journal of Physics,14, 1–13.

    Article  Google Scholar 

  38. Cumming, G. (2007). Replication and p values: p values predict the future vaguely, but confidence intervals do better. Manuscript submitted for publication.

  39. D’Agostini, G. (1999). Teaching statistics in the physics curriculum: Unifying and clarifying role of subjective probability.American Journal of Physics,67, 1260–1268.

    Article  Google Scholar 

  40. Dawid, A. P. (1984). Statistical theory: The prequential approach.Journal of the Royal Statistical Society: Series A,147, 278–292.

    Article  Google Scholar 

  41. De Finetti, B. (1974).Theory of probability: A critical introductory treatment (Vols. 1 & 2; A. Machí & A. Smith, Trans.). London: Wiley.

    Google Scholar 

  42. Diamond, G. A., &Forrester, J. S. (1983). Clinical trials and statistical verdicts: Probable grounds for appeal.Annals of Internal Medicine,98, 385–394.

    PubMed  Google Scholar 

  43. Dickey, J. M. (1973). Scientific reporting and personal probabilities: Student’s hypothesis.Journal of the Royal Statistical Society: Series B,35, 285–305.

    Google Scholar 

  44. Dickey, J. M. (1977). Is the tail area useful as an approximate Bayes factor?Journal of the American Statistical Association,72, 138–142.

    Article  Google Scholar 

  45. Dixon, P. (2003). The p value fallacy and how to avoid it.Canadian Journal of Experimental Psychology,57, 189–202.

    PubMed  Google Scholar 

  46. Djurić, P. M. (1998). Asymptotic MAP criteria for model selection.IEEE Transactions on Signal Processing,46, 2726–2735.

    Article  Google Scholar 

  47. Edwards, A. W. F. (1992).Likelihood. Baltimore: Johns Hopkins University Press.

    Google Scholar 

  48. Edwards, W., Lindman, H., &Savage, L. J. (1963). Bayesian statistical inference for psychological research.Psychological Review,70, 193–242.

    Article  Google Scholar 

  49. Efron, B. (2005). Bayesians, frequentists, and scientists.Journal of the American Statistical Association,100, 1–5.

    Article  Google Scholar 

  50. Efron, B., &Tibshirani, R. (1997). Improvements on cross-validation: The.6321 bootstrap method.Journal of the American Statistical Association,92, 548–560.

    Article  Google Scholar 

  51. Feller, W. (1940). Statistical aspects of ESP.Journal of Parapsychology,4, 271–298.

    Google Scholar 

  52. Feller, W. (1970).An introduction to probability theory and its applications: Vol. 1 (2nd ed.). New York: Wiley.

    Google Scholar 

  53. Fine, T. L. (1973).Theories of probability: An examination of foundations. New York: Academic Press.

    Google Scholar 

  54. Firth, D., &Kuha, J. (1999). Comments on “A critique of the Bayesian information criterion for model selection.”Sociological Methods & Research,27, 398–402.

    Article  Google Scholar 

  55. Fisher, R. A. (1934).Statistical methods for research workers (5th ed.). London: Oliver & Boyd.

    Google Scholar 

  56. Fisher, R. A. (1935a).The design of experiments. Edinburgh: Oliver & Boyd.

    Google Scholar 

  57. Fisher, R. A. (1935b). The logic of inductive inference (with discussion).Journal of the Royal Statistical Society,98, 39–82.

    Article  Google Scholar 

  58. Fisher, R. A. (1958).Statistical methods for research workers (13th ed.). New York: Hafner.

    Google Scholar 

  59. Freireich, E. J., Gehan, E., Frei, E., III,Schroeder, L. R., Wolman, I. J., Anbari, R., et al. (1963). The effect of 6-mercaptopurine on the duration of steroid-induced remissions in acute leukemia: A model for evaluation of other potentially useful therapy.Blood,21, 699–716.

    Google Scholar 

  60. Frick, R. W. (1996). The appropriate use of null hypothesis testing.Psychological Methods,1, 379–390.

    Article  Google Scholar 

  61. Friedman, L. M., Furberg, C. D., &DeMets, D. L. (1998).Fundamentals of clinical trials (3rd ed.). New York: Springer.

    Google Scholar 

  62. Galavotti, M. C. (2005).A philosophical introduction to probability. Stanford: CSLI Publications.

    Google Scholar 

  63. Geisser, S. (1975). The predictive sample reuse method with applications.Journal of the American Statistical Association,70, 320–328.

    Article  Google Scholar 

  64. Gelman, A., &Rubin, D. B. (1999). Evaluating and using statistical methods in the social sciences.Sociological Methods & Research,27, 403–410.

    Article  Google Scholar 

  65. Gigerenzer, G. (1993). The superego, the ego, and the id in statistical reasoning. In G. Keren & C. Lewis (Eds.),A handbook for data analysis in the behavioral sciences: Methodological issues (pp. 311–339). Hillsdale, NJ: Erlbaum.

    Google Scholar 

  66. Gigerenzer, G. (1998). We need statistical thinking, not statistical rituals.Behavioral & Brain Sciences,21, 199–200.

    Article  Google Scholar 

  67. Gilks, W. R., Richardson, S., &Spiegelhalter, D. J. (Eds.) (1996).Markov chain Monte Carlo in practice. Boca Raton, FL: Chapman & Hall/CRC.

    Google Scholar 

  68. Gill, J. (2002).Bayesian methods: A social and behavioral sciences approach. Boca Raton, FL: CRC Press.

    Google Scholar 

  69. Glover, S., &Dixon, P. (2004). Likelihood ratios: A simple and flexible statistic for empirical psychologists.Psychonomic Bulletin & Review,11, 791–806.

    Google Scholar 

  70. Good, I. J. (1983).Good thinking: The foundations of probability and its applications. Minneapolis: University of Minnesota Press.

    Google Scholar 

  71. Good, I. J. (1985). Weight of evidence: A brief survey. In J. M. Bernardo, M. H. DeGroot, D. V. Lindley, & A. F. M. Smith (Eds.),Bayesian statistics 2: Proceedings of the Second Valencia International Meeting, September 6/10, 1983 (pp. 249–269). Amsterdam: North-Holland.

    Google Scholar 

  72. Goodman, S. N. (1993).p values, hypothesis tests, and likelihood: Implications for epidemiology of a neglected historical debate.American Journal of Epidemiology,137, 485–496.

    PubMed  Google Scholar 

  73. Grünwald, P. [D.] (2000). Model selection based on minimum description length.Journal of Mathematical Psychology,44, 133–152.

    PubMed  Article  Google Scholar 

  74. Grünwald, P. D., Myung, I. J., &Pitt, M. A. (Eds.) (2005).Advances in minimum description length: Theory and applications. Cambridge, MA: MIT Press.

    Google Scholar 

  75. Hacking, I. (1965).Logic of statistical inference. Cambridge: Cambridge University Press.

    Google Scholar 

  76. Hagen, R. L. (1997). In praise of the null hypothesis statistical test.American Psychologist,52, 15–24.

    Article  Google Scholar 

  77. Haldane, J. B. S. (1945). On a method of estimating frequencies.Biometrika,33, 222–225.

    PubMed  Article  Google Scholar 

  78. Hannan, E. J. (1980). The estimation of the order of an ARMA process.Annals of Statistics,8, 1071–1081.

    Article  Google Scholar 

  79. Helland, I. S. (1995). Simple counterexamples against the conditionality principle.American Statistician,49, 351–356.

    Article  Google Scholar 

  80. Hill, B. M. (1985). Some subjective Bayesian considerations in the selection of models.Econometric Reviews,4, 191–246.

    Article  Google Scholar 

  81. Howson, C., &Urbach, P. (2005).Scientific reasoning: The Bayesian approach (3rd. ed.). Chicago: Open Court.

    Google Scholar 

  82. Hubbard, R., &Bayarri, M.-J. (2003). Confusion over measures of evidence (p’s) versus errors (a’s) in classical statistical testing.American Statistician,57, 171–182.

    Article  Google Scholar 

  83. Jaynes, E. T. (1968). Prior probabilities.IEEE Transactions on Systems Science & Cybernetics,4, 227–241.

    Article  Google Scholar 

  84. Jaynes, E. T. (2003).Probability theory: The logic of science. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  85. Jeffreys, H. (1961).Theory of probability. Oxford: Oxford University Press.

    Google Scholar 

  86. Jennison, C., &Turnbull, B. W. (1990). Statistical approaches to interim monitoring of medical trials: A review and commentary.Statistical Science,5, 299–317.

    Article  Google Scholar 

  87. Kadane, J. B., Schervish, M. J., &Seidenfeld, T. (1996). Reasoning to a foregone conclusion.Journal of the American Statistical Association,91, 1228–1235.

    Article  Google Scholar 

  88. Karabatsos, G. (2006). Bayesian nonparametric model selection and model testing.Journal of Mathematical Psychology,50, 123–148.

    Article  Google Scholar 

  89. Kass, R. E. (1993). Bayes factors in practice.Statistician,42, 551–560.

    Article  Google Scholar 

  90. Kass, R. E., &Raftery, A. E. (1995). Bayes factors.Journal of the American Statistical Association,90, 377–395.

    Google Scholar 

  91. Kass, R. E., &Wasserman, L. (1995). A reference Bayesian test for nested hypotheses and its relationship to the Schwarz criterion.Journal of the American Statistical Association,90, 928–934.

    Article  Google Scholar 

  92. Kass, R. E., &Wasserman, L. (1996). The selection of prior distributions by formal rules.Journal of the American Statistical Association,91, 1343–1370.

    Article  Google Scholar 

  93. Killeen, P. R. (2005a). An alternative to null-hypothesis significance tests.Psychological Science,16, 345–353.

    PubMed  Article  Google Scholar 

  94. Killeen, P. R. (2005b). Replicability, confidence, and priors.Psychological Science,16, 1009–1012.

    PubMed  Article  Google Scholar 

  95. Killeen, P. R. (2006). Beyond statistical inference: A decision theory for science.Psychonomic Bulletin & Review,13, 549–562.

    Google Scholar 

  96. Klugkist, I., Laudy, O., &Hoijtink, H. (2005). Inequality constrained analysis of variance: A Bayesian approach.Psychological Methods,10, 477–493.

    PubMed  Article  Google Scholar 

  97. Lee, M. D. (2002). Generating additive clustering models with limited stochastic complexity.Journal of Classification,19, 69–85.

    Article  Google Scholar 

  98. Lee, M. D., &Pope, K. J. (2006). Model selection for the rate problem: A comparison of significance testing, Bayesian, and minimum description length statistical inference.Journal of Mathematical Psychology,50, 193–202.

    Article  Google Scholar 

  99. Lee, M. D., &Wagenmakers, E.-J. (2005). Bayesian statistical inference in psychology: Comment on Trafimow (2003).Psychological Review,112, 662–668.

    PubMed  Article  Google Scholar 

  100. Lee, P. M. (1989).Bayesian statistics: An introduction. New York: Oxford University Press.

    Google Scholar 

  101. Lindley, D. V. (1957). A statistical paradox.Biometrika,44, 187–192.

    Google Scholar 

  102. Lindley, D. V. (1972).Bayesian statistics: A review. Philadelphia: Society for Industrial & Applied Mathematics.

    Google Scholar 

  103. Lindley, D. V. (1977). The distinction between inference and decision.Synthese,36, 51–58.

    Article  Google Scholar 

  104. Lindley, D. V. (1982). Scoring rules and the inevitability of probability.International Statistical Review,50, 1–26.

    Article  Google Scholar 

  105. Lindley, D. V. (1993). The analysis of experimental data: The appreciation of tea and wine.Teaching Statistics,15, 22–25.

    Article  Google Scholar 

  106. Lindley, D. V. (2004). That wretched prior.Significance,1, 85–87.

    Article  Google Scholar 

  107. Lindley, D. V., &Phillips, L. D. (1976). Inference for a Bernoulli process (a Bayesian view).American Statistician,30, 112–119.

    Article  Google Scholar 

  108. Lindley, D. V., &Scott, W. F. (1984).New Cambridge elementary statistical tables. Cambridge: Cambridge University Press.

    Google Scholar 

  109. Loftus, G. R. (1996). Psychology will be a much better science when we change the way we analyze data.Current Directions in Psychological Science,5, 161–171.

    Article  Google Scholar 

  110. Loftus, G. R. (2002). Analysis, interpretation, and visual presentation of experimental data. In H. Pashler (Ed. in Chief) & J. Wixted (Vol. Ed.),Stevens’ Handbook of experimental psychology: Vol. 4. Methodology in experimental psychology (3rd ed., pp. 339–390). New York: Wiley.

    Google Scholar 

  111. Ludbrook, J. (2003). Interim analyses of data as they accumulate in laboratory experimentation.BMC Medical Research Methodology,3, 15.

    PubMed  Article  Google Scholar 

  112. McCarroll, D., Crays, N., &Dunlap, W. P. (1992). Sequential ANOVAs and Type I error rates.Educational & Psychological Measurement,52, 387–393.

    Article  Google Scholar 

  113. Myung, I. J. (2000). The importance of complexity in model selection.Journal of Mathematical Psychology,44, 190–204.

    PubMed  Article  Google Scholar 

  114. Myung, I. J., Forster, M. R., & Browne, M. W. (Eds.) (2000). Model selection [Special issue].Journal of Mathematical Psychology,44(1).

  115. Myung, I. J., Navarro, D. J., &Pitt, M. A. (2006). Model selection by normalized maximum likelihood.Journal of Mathematical Psychology,50, 167–179.

    Article  Google Scholar 

  116. Myung, I. J., &Pitt, M. A. (1997). Applying Occam’s razor in modeling cognition: A Bayesian approach.Psychonomic Bulletin & Review,4, 79–95.

    Google Scholar 

  117. Nelson, N., Rosenthal, R., &Rosnow, R. L. (1986). Interpretation of significance levels and effect sizes by psychological researchers.American Psychologist,41, 1299–1301.

    Article  Google Scholar 

  118. Neyman, J. (1977). Frequentist probability and frequentist statistics.Synthese,36, 97–131.

    Article  Google Scholar 

  119. Neyman, J., &Pearson, E. S. (1933). On the problem of the most efficient tests of statistical hypotheses.Philosophical Transactions of the Royal Society: Series A,231, 289–337.

    Article  Google Scholar 

  120. Nickerson, R. S. (2000). Null hypothesis statistical testing: A review of an old and continuing controversy.Psychological Methods,5, 241–301.

    PubMed  Article  Google Scholar 

  121. O’Hagan, A. (1997). Fractional Bayes factors for model comparison.Journal of the Royal Statistical Society: Series B,57, 99–138.

    Google Scholar 

  122. O’Hagan, A. (2004). Dicing with the unknown.Significance,1, 132–133.

    Article  Google Scholar 

  123. O’Hagan, A., &Forster, J. (2004).Kendall’s advanced theory of statistics: Vol. 2B. Bayesian inference (2nd ed.). London: Arnold.

    Google Scholar 

  124. Pauler, D. K. (1998). The Schwarz criterion and related methods for normal linear models.Biometrika,85, 13–27.

    Article  Google Scholar 

  125. Peto, R., Pike, M. C., Armitage, P., Breslow, N. E., Cox, D. R., Howard, S. V., et al. (1976). Design and analysis of randomized clinical trials requiring prolonged observation of each patient: I. Introduction and design.British Journal of Cancer,34, 585–612.

    PubMed  Google Scholar 

  126. Pitt, M. A., Myung, I. J., &Zhang, S. (2002). Toward a method of selecting among computational models of cognition.Psychological Review,109, 472–491.

    PubMed  Article  Google Scholar 

  127. Pocock, S. J. (1983).Clinical trials: A practical approach. New York: Wiley.

    Google Scholar 

  128. Pratt, J. W. (1961). [Review of Lehmann, E. L., Testing statistical hypotheses].Journal of the American Statistical Association,56, 163–167.

    Article  Google Scholar 

  129. Pratt, J. W. (1962). On the foundations of statistical inference: Discussion.Journal of the American Statistical Association,57, 314–315.

    Google Scholar 

  130. Raftery, A. E. (1993). Bayesian model selection in structural equation models. In K. A. Bollen & J. S. Long (Eds.),Testing structural equation models (pp. 163–180). Newbury Park, CA: Sage.

    Google Scholar 

  131. Raftery, A. E. (1995). Bayesian model selection in social research. In P. V. Marsden (Ed.),Sociological methodology 1995 (pp. 111–196). Cambridge, MA: Blackwell.

    Google Scholar 

  132. Raftery, A. E. (1996). Hypothesis testing and model selection. In W. R. Gilks, S. Richardson, & D. J. Spiegelhalter (Eds.),Markov chain Monte Carlo in practice (pp. 163–187). Boca Raton, FL: Chapman & Hall/CRC.

    Google Scholar 

  133. Raftery, A. E. (1999). Bayes factors and BIC.Sociological Methods & Research,27, 411–427.

    Article  Google Scholar 

  134. Rissanen, J. (2001). Strong optimality of the normalized ML models as universal codes and information in data.IEEE Transactions on Information Theory,47, 1712–1717.

    Article  Google Scholar 

  135. Robert, C. P., &Casella, G. (1999).Monte Carlo statistical methods. New York: Springer.

    Google Scholar 

  136. Rosenthal, R., &Gaito, J. (1963). The interpretation of levels of significance by psychological researchers.Journal of Psychology,55, 33–38.

    Article  Google Scholar 

  137. Rouder, J. N., &Lu, J. (2005). An introduction to Bayesian hierarchical models with an application in the theory of signal detection.Psychonomic Bulletin & Review,12, 573–604.

    Google Scholar 

  138. Rouder, J. N., Lu, J., Speckman, P., Sun, D., &Jiang, Y. (2005). A hierarchical model for estimating response time distributions.Psychonomic Bulletin & Review,12, 195–223.

    Google Scholar 

  139. Royall, R. M. (1997).Statistical evidence: A likelihood paradigm. London: Chapman & Hall.

    Google Scholar 

  140. Savage, L. J. (1954).The foundations of statistics. New York: Wiley.

    Google Scholar 

  141. Schervish, M. J. (1996).P values: What they are and what they are not.American Statistician,50, 203–206.

    Article  Google Scholar 

  142. Schmidt, F. L. (1996). Statistical significance testing and cumulative knowledge in psychology: Implications for training of researchers.Psychological Methods,1, 115–129.

    Article  Google Scholar 

  143. Schwarz, G. (1978). Estimating the dimension of a model.Annals of Statistics,6, 461–464.

    Article  Google Scholar 

  144. Sellke, T., Bayarri, M.-J., &Berger, J. O. (2001). Calibration of p values for testing precise null hypotheses.American Statistician,55, 62–71.

    Article  Google Scholar 

  145. Shafer, G. (1982). Lindley’s paradox.Journal of the American Statistical Association,77, 325–351.

    Article  Google Scholar 

  146. Siegmund, D. (1985).Sequential analysis: Tests and confidence intervals. New York: Springer.

    Google Scholar 

  147. Smith, A. F. M., &Spiegelhalter, D. J. (1980). Bayes factors and choice criteria for linear models.Journal of the Royal Statistical Society: Series B,42, 213–220.

    Google Scholar 

  148. Stone, M. (1974). Cross-validatory choice and assessment of statistical predictions (with discussion).Journal of the Royal Statistical Society: Series B,36, 111–147.

    Google Scholar 

  149. Strube, M. J. (2006). SNOOP: A program for demonstrating the consequences of premature and repeated null hypothesis testing.Behavior Research Methods,38, 24–27.

    PubMed  Google Scholar 

  150. Stuart, A., Ord, J. K., &Arnold, S. (1999).Kendall’s advanced theory of statistics: Vol. 2A. Classical inference and the linear model (6th ed.). London: Arnold.

    Google Scholar 

  151. Trafimow, D. (2003). Hypothesis testing and theory evaluation at the boundaries: Surprising insights from Bayes’s theorem.Psychological Review,110, 526–535.

    PubMed  Article  Google Scholar 

  152. Vickers, D., Lee, M. D., Dry, M., &Hughes, P. (2003). The roles of the convex hull and the number of potential intersections in performance on visually presented traveling salesperson problems.Memory & Cognition,31, 1094–1104.

    Google Scholar 

  153. Wagenmakers, E.-J. (2003). How many parameters does it take to fit an elephant? [Book review].Journal of Mathematical Psychology,47, 580–586.

    Article  Google Scholar 

  154. Wagenmakers, E.-J., &Farrell, S. (2004). AIC model selection using Akaike weights.Psychonomic Bulletin & Review,11, 192–196.

    Google Scholar 

  155. Wagenmakers, E.-J., &Grünwald, P. (2006). A Bayesian perspective on hypothesis testing: A comment on Killeen (2005).Psychological Science,17, 641–642.

    PubMed  Article  Google Scholar 

  156. Wagenmakers, E.-J., Grünwald, P., &Steyvers, M. (2006). Accumulative prediction error and the selection of time series models.Journal of Mathematical Psychology,50, 149–166.

    Article  Google Scholar 

  157. Wagenmakers, E.-J., Ratcliff, R., Gomez, P., &Iverson, G. J. (2004). Assessing model mimicry using the parametric bootstrap.Journal of Mathematical Psychology,48, 28–50.

    PubMed  Article  Google Scholar 

  158. Wagenmakers, E.-J., & Waldorp, L. (Eds.) (2006). Model selection: Theoretical developments and applications [Special issue].Journal of Mathematical Psychology,50(2).

  159. Wainer, H. (1999). One cheer for null hypothesis significance testing.Psychological Methods,4, 212–213.

    Article  Google Scholar 

  160. Wallace, C. S., &Dowe, D. L. (1999). Refinements of MDL and MML coding.Computer Journal,42, 330–337.

    Article  Google Scholar 

  161. Ware, J. H. (1989). Investigating therapies of potentially great benefit: ECMO.Statistical Science,4, 298–340.

    Article  Google Scholar 

  162. Wasserman, L. (2000). Bayesian model selection and model averaging.Journal of Mathematical Psychology,44, 92–107.

    PubMed  Article  Google Scholar 

  163. Wasserman, L. (2004).All of statistics: A concise course in statistical inference. New York: Springer.

    Google Scholar 

  164. Weakliem, D. L. (1999). A critique of the Bayesian information criterion for model selection.Sociological Methods & Research,27, 359–397.

    Article  Google Scholar 

  165. Wilkinson, L., &the Task Force on Statistical Inference (1999). Statistical methods in psychology journals: Guidelines and explanations.American Psychologist,54, 594–604.

    Article  Google Scholar 

  166. Winship, C. (1999). Editor’s introduction to the special issue on the Bayesian information criterion.Sociological Methods & Research,27, 355–358.

    Article  Google Scholar 

  167. Xie, Y. (1999). The tension between generality and accuracy.Sociological Methods & Research,27, 428–435.

    Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Eric-Jan Wagenmakers.

Additional information

This research was supported by a Veni Grant from the Dutch Organization for Scientific Research (NWO). I thank Scott Brown, Peter Dixon, Simon Farrell, Raoul Grasman, Geoff Iverson, Michael Lee, Martijn Meeter, Jay Myung, Jeroen Raaijmakers, Jeff Rouder, and Rich Shiffrin for helpful comments on earlier drafts of this article. Mark Steyvers convinced me that this article would be seriously incomplete without a consideration of practical alternatives to thep value methodology.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Wagenmakers, EJ. A practical solution to the pervasive problems ofp values. Psychonomic Bulletin & Review 14, 779–804 (2007). https://doi.org/10.3758/BF03194105

Download citation

Keywords

  • Null Hypothesis
  • Posterior Probability
  • Prior Distribution
  • Bayesian Information Criterion
  • Statistical Inference