A practical solution to the pervasive problems ofp values
- 5.6k Downloads
- 146 Citations
Abstract
In the field of psychology, the practice ofp value null-hypothesis testing is as widespread as ever. Despite this popularity, or perhaps because of it, most psychologists are not aware of the statistical peculiarities of thep value procedure. In particular,p values are based on data that were never observed, and these hypothetical data are themselves influenced by subjective intentions. Moreover,p values do not quantify statistical evidence. This article reviews thesep value problems and illustrates each problem with concrete examples. The three problems are familiar to statisticians but may be new to psychologists. A practical solution to thesep value problems is to adopt a model selection perspective and use the Bayesian information criterion (BIC) for statistical inference (Raftery, 1995). The BIC provides an approximation to a Bayesian hypothesis test, does not require the specification of priors, and can be easily calculated from SPSS output.
Keywords
Null Hypothesis Posterior Probability Prior Distribution Bayesian Information Criterion Statistical InferenceReferences
- Akaike, H. (1974). A new look at the statistical model identification.IEEE Transactions on Automatic Control,19, 716–723.CrossRefGoogle Scholar
- Anscombe, F. J. (1954). Fixed-sample-size analysis of sequential observations.Biometrics,10, 89–100.CrossRefGoogle Scholar
- Anscombe, F. J. (1963). Sequential medical trials.Journal of the American Statistical Association,58, 365–383.CrossRefGoogle Scholar
- Armitage, P. (1957). Restricted sequential procedures.Biometrika,44, 9–26.Google Scholar
- Armitage, P. (1960).Sequential medical trials. Springfield, IL: Thomas.Google Scholar
- Armitage, P., McPherson, C. K., &Rowe, B. C. (1969). Repeated significance tests on accumulating data.Journal of the Royal Statistical Society: Series A,132, 235–244.CrossRefGoogle Scholar
- Bakan, D. (1966). The test of significance in psychological research.Psychological Bulletin,66, 423–437.PubMedCrossRefGoogle Scholar
- Barnard, G. A. (1947). The meaning of a significance level.Biometrika,34, 179–182.Google Scholar
- Basu, D. (1964). Recovery of ancillary information.Sankhya: Series A,26, 3–16.Google Scholar
- Bayarri, M.-J., &Berger, J. O. (2004). The interplay of Bayesian and frequentist analysis.Statistical Science,19, 58–80.CrossRefGoogle Scholar
- Berger, J. O. (1985).Statistical decision theory and Bayesian analysis (2nd ed.). New York: Springer.Google Scholar
- Berger, J. O. (2003). Could Fisher, Jeffreys and Neyman have agreed on testing?Statistical Science,18, 1–32.CrossRefGoogle Scholar
- Berger, J. O., &Berry, D. A. (1988a). The relevance of stopping rules in statistical inference. In S. S. Gupta & J. O. Berger (Eds.),Statistical decision theory and related topics IV (Vol. 1, pp. 29–72). New York: Springer.Google Scholar
- Berger, J. O., &Berry, D. A. (1988b). Statistical analysis and the illusion of objectivity.American Scientist,76, 159–165.Google Scholar
- Berger, J. O., Boukai, B., &Wang, Y. (1997). Unified frequentist and Bayesian testing of a precise hypothesis (with discussion).Statistical Science,12, 133–160.CrossRefGoogle Scholar
- Berger, J. O., Brown, L., &Wolpert, R. (1994). A unified conditional frequentist and Bayesian test for fixed and sequential hypothesis testing.Annals of Statistics,22, 1787–1807.CrossRefGoogle Scholar
- Berger, J. O., &Delampady, M. (1987). Testing precise hypotheses.Statistical Science,2, 317–352.CrossRefGoogle Scholar
- Berger, J. O., &Mortera, J. (1999). Default Bayes factors for nonnested hypothesis testing.Journal of the American Statistical Association,94, 542–554.CrossRefGoogle Scholar
- Berger, J. O., &Pericchi, L. R. (1996). The intrinsic Bayes factor for model selection and prediction.Journal of the American Statistical Association,91, 109–122.CrossRefGoogle Scholar
- Berger, J. O., &Sellke, T. (1987). Testing a point null hypothesis: The irreconcilability of p values and evidence.Journal of the American Statistical Association,82, 112–139.CrossRefGoogle Scholar
- Berger, J. O., &Wolpert, R. L. (1988).The likelihood principle (2nd ed.). Hayward, CA: Institute of Mathematical Statistics.Google Scholar
- Bernardo, J. M., &Smith, A. F. M. (1994).Bayesian theory. Chichester, U.K.: Wiley.CrossRefGoogle Scholar
- Birnbaum, A. (1962). On the foundations of statistical inference (with discussion).Journal of the American Statistical Association,53, 259–326.Google Scholar
- Birnbaum, A. (1977). The Neyman—Pearson theory as decision theory, and as inference theory; with a criticism of the Lindley—Savage argument for Bayesian theory.Synthese,36, 19–49.CrossRefGoogle Scholar
- Box, G. E. P., &Tiao, G. C. (1973).Bayesian inference in statistical analysis. Reading, MA: Addison-Wesley.Google Scholar
- Browne, M. (2000). Cross-validation methods.Journal of Mathematical Psychology,44, 108–132.PubMedCrossRefGoogle Scholar
- Burdette, W. J., &Gehan, E. A. (1970).Planning and analysis of clinical studies. Springfield, IL: Thomas.Google Scholar
- Burnham, K. P., &Anderson, D. R. (2002).Model selection and multimodel inference: A practical information-theoretic approach (2nd ed.). New York: Springer.Google Scholar
- Busemeyer, J. R., &Stout, J. C. (2002). A contribution of cognitive decision models to clinical assessment: Decomposing performance on the Bechara gambling task.Psychological Assessment,14, 253–262.PubMedCrossRefGoogle Scholar
- Christensen, R. (2005). Testing Fisher, Neyman, Pearson, and Bayes.American Statistician,59, 121–126.CrossRefGoogle Scholar
- Cohen, J. (1994). The earth is round (p <.05).American Psychologist,49, 997–1003.CrossRefGoogle Scholar
- Cornfield, J. (1966). Sequential trials, sequential analysis, and the likelihood principle.American Statistician,20, 18–23.CrossRefGoogle Scholar
- Cornfield, J. (1969). The Bayesian outlook and its application.Biometrics,25, 617–657.PubMedCrossRefGoogle Scholar
- Cortina, J. M., &Dunlap, W. P. (1997). On the logic and purpose of significance testing.Psychological Methods,2, 161–172.CrossRefGoogle Scholar
- Cox, D. R. (1958). Some problems connected with statistical inference.Annals of Mathematical Statistics,29, 357–372.CrossRefGoogle Scholar
- Cox, D. R. (1971). The choice between alternative ancillary statistics.Journal of the Royal Statistical Society: Series B,33, 251–255.Google Scholar
- Cox, R. T. (1946). Probability, frequency and reasonable expectation.American Journal of Physics,14, 1–13.CrossRefGoogle Scholar
- Cumming, G. (2007). Replication and p values: p values predict the future vaguely, but confidence intervals do better. Manuscript submitted for publication.Google Scholar
- D’Agostini, G. (1999). Teaching statistics in the physics curriculum: Unifying and clarifying role of subjective probability.American Journal of Physics,67, 1260–1268.CrossRefGoogle Scholar
- Dawid, A. P. (1984). Statistical theory: The prequential approach.Journal of the Royal Statistical Society: Series A,147, 278–292.CrossRefGoogle Scholar
- De Finetti, B. (1974).Theory of probability: A critical introductory treatment (Vols. 1 & 2; A. Machí & A. Smith, Trans.). London: Wiley.Google Scholar
- Diamond, G. A., &Forrester, J. S. (1983). Clinical trials and statistical verdicts: Probable grounds for appeal.Annals of Internal Medicine,98, 385–394.PubMedGoogle Scholar
- Dickey, J. M. (1973). Scientific reporting and personal probabilities: Student’s hypothesis.Journal of the Royal Statistical Society: Series B,35, 285–305.Google Scholar
- Dickey, J. M. (1977). Is the tail area useful as an approximate Bayes factor?Journal of the American Statistical Association,72, 138–142.CrossRefGoogle Scholar
- Dixon, P. (2003). The p value fallacy and how to avoid it.Canadian Journal of Experimental Psychology,57, 189–202.PubMedGoogle Scholar
- Djurić, P. M. (1998). Asymptotic MAP criteria for model selection.IEEE Transactions on Signal Processing,46, 2726–2735.CrossRefGoogle Scholar
- Edwards, A. W. F. (1992).Likelihood. Baltimore: Johns Hopkins University Press.Google Scholar
- Edwards, W., Lindman, H., &Savage, L. J. (1963). Bayesian statistical inference for psychological research.Psychological Review,70, 193–242.CrossRefGoogle Scholar
- Efron, B. (2005). Bayesians, frequentists, and scientists.Journal of the American Statistical Association,100, 1–5.CrossRefGoogle Scholar
- Efron, B., &Tibshirani, R. (1997). Improvements on cross-validation: The.6321 bootstrap method.Journal of the American Statistical Association,92, 548–560.CrossRefGoogle Scholar
- Feller, W. (1940). Statistical aspects of ESP.Journal of Parapsychology,4, 271–298.Google Scholar
- Feller, W. (1970).An introduction to probability theory and its applications: Vol. 1 (2nd ed.). New York: Wiley.Google Scholar
- Fine, T. L. (1973).Theories of probability: An examination of foundations. New York: Academic Press.Google Scholar
- Firth, D., &Kuha, J. (1999). Comments on “A critique of the Bayesian information criterion for model selection.”Sociological Methods & Research,27, 398–402.CrossRefGoogle Scholar
- Fisher, R. A. (1934).Statistical methods for research workers (5th ed.). London: Oliver & Boyd.Google Scholar
- Fisher, R. A. (1935a).The design of experiments. Edinburgh: Oliver & Boyd.Google Scholar
- Fisher, R. A. (1935b). The logic of inductive inference (with discussion).Journal of the Royal Statistical Society,98, 39–82.CrossRefGoogle Scholar
- Fisher, R. A. (1958).Statistical methods for research workers (13th ed.). New York: Hafner.Google Scholar
- Freireich, E. J., Gehan, E., Frei, E., III,Schroeder, L. R., Wolman, I. J., Anbari, R., et al. (1963). The effect of 6-mercaptopurine on the duration of steroid-induced remissions in acute leukemia: A model for evaluation of other potentially useful therapy.Blood,21, 699–716.Google Scholar
- Frick, R. W. (1996). The appropriate use of null hypothesis testing.Psychological Methods,1, 379–390.CrossRefGoogle Scholar
- Friedman, L. M., Furberg, C. D., &DeMets, D. L. (1998).Fundamentals of clinical trials (3rd ed.). New York: Springer.Google Scholar
- Galavotti, M. C. (2005).A philosophical introduction to probability. Stanford: CSLI Publications.Google Scholar
- Geisser, S. (1975). The predictive sample reuse method with applications.Journal of the American Statistical Association,70, 320–328.CrossRefGoogle Scholar
- Gelman, A., &Rubin, D. B. (1999). Evaluating and using statistical methods in the social sciences.Sociological Methods & Research,27, 403–410.CrossRefGoogle Scholar
- Gigerenzer, G. (1993). The superego, the ego, and the id in statistical reasoning. In G. Keren & C. Lewis (Eds.),A handbook for data analysis in the behavioral sciences: Methodological issues (pp. 311–339). Hillsdale, NJ: Erlbaum.Google Scholar
- Gigerenzer, G. (1998). We need statistical thinking, not statistical rituals.Behavioral & Brain Sciences,21, 199–200.CrossRefGoogle Scholar
- Gilks, W. R., Richardson, S., &Spiegelhalter, D. J. (Eds.) (1996).Markov chain Monte Carlo in practice. Boca Raton, FL: Chapman & Hall/CRC.Google Scholar
- Gill, J. (2002).Bayesian methods: A social and behavioral sciences approach. Boca Raton, FL: CRC Press.Google Scholar
- Glover, S., &Dixon, P. (2004). Likelihood ratios: A simple and flexible statistic for empirical psychologists.Psychonomic Bulletin & Review,11, 791–806.Google Scholar
- Good, I. J. (1983).Good thinking: The foundations of probability and its applications. Minneapolis: University of Minnesota Press.Google Scholar
- Good, I. J. (1985). Weight of evidence: A brief survey. In J. M. Bernardo, M. H. DeGroot, D. V. Lindley, & A. F. M. Smith (Eds.),Bayesian statistics 2: Proceedings of the Second Valencia International Meeting, September 6/10, 1983 (pp. 249–269). Amsterdam: North-Holland.Google Scholar
- Goodman, S. N. (1993).p values, hypothesis tests, and likelihood: Implications for epidemiology of a neglected historical debate.American Journal of Epidemiology,137, 485–496.PubMedGoogle Scholar
- Grünwald, P. [D.] (2000). Model selection based on minimum description length.Journal of Mathematical Psychology,44, 133–152.PubMedCrossRefGoogle Scholar
- Grünwald, P. D., Myung, I. J., &Pitt, M. A. (Eds.) (2005).Advances in minimum description length: Theory and applications. Cambridge, MA: MIT Press.Google Scholar
- Hacking, I. (1965).Logic of statistical inference. Cambridge: Cambridge University Press.Google Scholar
- Hagen, R. L. (1997). In praise of the null hypothesis statistical test.American Psychologist,52, 15–24.CrossRefGoogle Scholar
- Haldane, J. B. S. (1945). On a method of estimating frequencies.Biometrika,33, 222–225.PubMedCrossRefGoogle Scholar
- Hannan, E. J. (1980). The estimation of the order of an ARMA process.Annals of Statistics,8, 1071–1081.CrossRefGoogle Scholar
- Helland, I. S. (1995). Simple counterexamples against the conditionality principle.American Statistician,49, 351–356.CrossRefGoogle Scholar
- Hill, B. M. (1985). Some subjective Bayesian considerations in the selection of models.Econometric Reviews,4, 191–246.CrossRefGoogle Scholar
- Howson, C., &Urbach, P. (2005).Scientific reasoning: The Bayesian approach (3rd. ed.). Chicago: Open Court.Google Scholar
- Hubbard, R., &Bayarri, M.-J. (2003). Confusion over measures of evidence (p’s) versus errors (a’s) in classical statistical testing.American Statistician,57, 171–182.CrossRefGoogle Scholar
- Jaynes, E. T. (1968). Prior probabilities.IEEE Transactions on Systems Science & Cybernetics,4, 227–241.CrossRefGoogle Scholar
- Jaynes, E. T. (2003).Probability theory: The logic of science. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
- Jeffreys, H. (1961).Theory of probability. Oxford: Oxford University Press.Google Scholar
- Jennison, C., &Turnbull, B. W. (1990). Statistical approaches to interim monitoring of medical trials: A review and commentary.Statistical Science,5, 299–317.CrossRefGoogle Scholar
- Kadane, J. B., Schervish, M. J., &Seidenfeld, T. (1996). Reasoning to a foregone conclusion.Journal of the American Statistical Association,91, 1228–1235.CrossRefGoogle Scholar
- Karabatsos, G. (2006). Bayesian nonparametric model selection and model testing.Journal of Mathematical Psychology,50, 123–148.CrossRefGoogle Scholar
- Kass, R. E. (1993). Bayes factors in practice.Statistician,42, 551–560.CrossRefGoogle Scholar
- Kass, R. E., &Raftery, A. E. (1995). Bayes factors.Journal of the American Statistical Association,90, 377–395.Google Scholar
- Kass, R. E., &Wasserman, L. (1995). A reference Bayesian test for nested hypotheses and its relationship to the Schwarz criterion.Journal of the American Statistical Association,90, 928–934.CrossRefGoogle Scholar
- Kass, R. E., &Wasserman, L. (1996). The selection of prior distributions by formal rules.Journal of the American Statistical Association,91, 1343–1370.CrossRefGoogle Scholar
- Killeen, P. R. (2005a). An alternative to null-hypothesis significance tests.Psychological Science,16, 345–353.PubMedCrossRefGoogle Scholar
- Killeen, P. R. (2005b). Replicability, confidence, and priors.Psychological Science,16, 1009–1012.PubMedCrossRefGoogle Scholar
- Killeen, P. R. (2006). Beyond statistical inference: A decision theory for science.Psychonomic Bulletin & Review,13, 549–562.Google Scholar
- Klugkist, I., Laudy, O., &Hoijtink, H. (2005). Inequality constrained analysis of variance: A Bayesian approach.Psychological Methods,10, 477–493.PubMedCrossRefGoogle Scholar
- Lee, M. D. (2002). Generating additive clustering models with limited stochastic complexity.Journal of Classification,19, 69–85.CrossRefGoogle Scholar
- Lee, M. D., &Pope, K. J. (2006). Model selection for the rate problem: A comparison of significance testing, Bayesian, and minimum description length statistical inference.Journal of Mathematical Psychology,50, 193–202.CrossRefGoogle Scholar
- Lee, M. D., &Wagenmakers, E.-J. (2005). Bayesian statistical inference in psychology: Comment on Trafimow (2003).Psychological Review,112, 662–668.PubMedCrossRefGoogle Scholar
- Lee, P. M. (1989).Bayesian statistics: An introduction. New York: Oxford University Press.Google Scholar
- Lindley, D. V. (1957). A statistical paradox.Biometrika,44, 187–192.Google Scholar
- Lindley, D. V. (1972).Bayesian statistics: A review. Philadelphia: Society for Industrial & Applied Mathematics.Google Scholar
- Lindley, D. V. (1977). The distinction between inference and decision.Synthese,36, 51–58.CrossRefGoogle Scholar
- Lindley, D. V. (1982). Scoring rules and the inevitability of probability.International Statistical Review,50, 1–26.CrossRefGoogle Scholar
- Lindley, D. V. (1993). The analysis of experimental data: The appreciation of tea and wine.Teaching Statistics,15, 22–25.CrossRefGoogle Scholar
- Lindley, D. V. (2004). That wretched prior.Significance,1, 85–87.CrossRefGoogle Scholar
- Lindley, D. V., &Phillips, L. D. (1976). Inference for a Bernoulli process (a Bayesian view).American Statistician,30, 112–119.CrossRefGoogle Scholar
- Lindley, D. V., &Scott, W. F. (1984).New Cambridge elementary statistical tables. Cambridge: Cambridge University Press.Google Scholar
- Loftus, G. R. (1996). Psychology will be a much better science when we change the way we analyze data.Current Directions in Psychological Science,5, 161–171.CrossRefGoogle Scholar
- Loftus, G. R. (2002). Analysis, interpretation, and visual presentation of experimental data. In H. Pashler (Ed. in Chief) & J. Wixted (Vol. Ed.),Stevens’ Handbook of experimental psychology: Vol. 4. Methodology in experimental psychology (3rd ed., pp. 339–390). New York: Wiley.Google Scholar
- Ludbrook, J. (2003). Interim analyses of data as they accumulate in laboratory experimentation.BMC Medical Research Methodology,3, 15.PubMedCrossRefGoogle Scholar
- McCarroll, D., Crays, N., &Dunlap, W. P. (1992). Sequential ANOVAs and Type I error rates.Educational & Psychological Measurement,52, 387–393.CrossRefGoogle Scholar
- Myung, I. J. (2000). The importance of complexity in model selection.Journal of Mathematical Psychology,44, 190–204.PubMedCrossRefGoogle Scholar
- Myung, I. J., Forster, M. R., & Browne, M. W. (Eds.) (2000). Model selection [Special issue].Journal of Mathematical Psychology,44(1).Google Scholar
- Myung, I. J., Navarro, D. J., &Pitt, M. A. (2006). Model selection by normalized maximum likelihood.Journal of Mathematical Psychology,50, 167–179.CrossRefGoogle Scholar
- Myung, I. J., &Pitt, M. A. (1997). Applying Occam’s razor in modeling cognition: A Bayesian approach.Psychonomic Bulletin & Review,4, 79–95.Google Scholar
- Nelson, N., Rosenthal, R., &Rosnow, R. L. (1986). Interpretation of significance levels and effect sizes by psychological researchers.American Psychologist,41, 1299–1301.CrossRefGoogle Scholar
- Neyman, J. (1977). Frequentist probability and frequentist statistics.Synthese,36, 97–131.CrossRefGoogle Scholar
- Neyman, J., &Pearson, E. S. (1933). On the problem of the most efficient tests of statistical hypotheses.Philosophical Transactions of the Royal Society: Series A,231, 289–337.CrossRefGoogle Scholar
- Nickerson, R. S. (2000). Null hypothesis statistical testing: A review of an old and continuing controversy.Psychological Methods,5, 241–301.PubMedCrossRefGoogle Scholar
- O’Hagan, A. (1997). Fractional Bayes factors for model comparison.Journal of the Royal Statistical Society: Series B,57, 99–138.Google Scholar
- O’Hagan, A. (2004). Dicing with the unknown.Significance,1, 132–133.CrossRefGoogle Scholar
- O’Hagan, A., &Forster, J. (2004).Kendall’s advanced theory of statistics: Vol. 2B. Bayesian inference (2nd ed.). London: Arnold.Google Scholar
- Pauler, D. K. (1998). The Schwarz criterion and related methods for normal linear models.Biometrika,85, 13–27.CrossRefGoogle Scholar
- Peto, R., Pike, M. C., Armitage, P., Breslow, N. E., Cox, D. R., Howard, S. V., et al. (1976). Design and analysis of randomized clinical trials requiring prolonged observation of each patient: I. Introduction and design.British Journal of Cancer,34, 585–612.PubMedGoogle Scholar
- Pitt, M. A., Myung, I. J., &Zhang, S. (2002). Toward a method of selecting among computational models of cognition.Psychological Review,109, 472–491.PubMedCrossRefGoogle Scholar
- Pocock, S. J. (1983).Clinical trials: A practical approach. New York: Wiley.Google Scholar
- Pratt, J. W. (1961). [Review of Lehmann, E. L., Testing statistical hypotheses].Journal of the American Statistical Association,56, 163–167.CrossRefGoogle Scholar
- Pratt, J. W. (1962). On the foundations of statistical inference: Discussion.Journal of the American Statistical Association,57, 314–315.Google Scholar
- Raftery, A. E. (1993). Bayesian model selection in structural equation models. In K. A. Bollen & J. S. Long (Eds.),Testing structural equation models (pp. 163–180). Newbury Park, CA: Sage.Google Scholar
- Raftery, A. E. (1995). Bayesian model selection in social research. In P. V. Marsden (Ed.),Sociological methodology 1995 (pp. 111–196). Cambridge, MA: Blackwell.Google Scholar
- Raftery, A. E. (1996). Hypothesis testing and model selection. In W. R. Gilks, S. Richardson, & D. J. Spiegelhalter (Eds.),Markov chain Monte Carlo in practice (pp. 163–187). Boca Raton, FL: Chapman & Hall/CRC.Google Scholar
- Raftery, A. E. (1999). Bayes factors and BIC.Sociological Methods & Research,27, 411–427.CrossRefGoogle Scholar
- Rissanen, J. (2001). Strong optimality of the normalized ML models as universal codes and information in data.IEEE Transactions on Information Theory,47, 1712–1717.CrossRefGoogle Scholar
- Robert, C. P., &Casella, G. (1999).Monte Carlo statistical methods. New York: Springer.Google Scholar
- Rosenthal, R., &Gaito, J. (1963). The interpretation of levels of significance by psychological researchers.Journal of Psychology,55, 33–38.CrossRefGoogle Scholar
- Rouder, J. N., &Lu, J. (2005). An introduction to Bayesian hierarchical models with an application in the theory of signal detection.Psychonomic Bulletin & Review,12, 573–604.Google Scholar
- Rouder, J. N., Lu, J., Speckman, P., Sun, D., &Jiang, Y. (2005). A hierarchical model for estimating response time distributions.Psychonomic Bulletin & Review,12, 195–223.Google Scholar
- Royall, R. M. (1997).Statistical evidence: A likelihood paradigm. London: Chapman & Hall.Google Scholar
- Savage, L. J. (1954).The foundations of statistics. New York: Wiley.Google Scholar
- Schervish, M. J. (1996).P values: What they are and what they are not.American Statistician,50, 203–206.CrossRefGoogle Scholar
- Schmidt, F. L. (1996). Statistical significance testing and cumulative knowledge in psychology: Implications for training of researchers.Psychological Methods,1, 115–129.CrossRefGoogle Scholar
- Schwarz, G. (1978). Estimating the dimension of a model.Annals of Statistics,6, 461–464.CrossRefGoogle Scholar
- Sellke, T., Bayarri, M.-J., &Berger, J. O. (2001). Calibration of p values for testing precise null hypotheses.American Statistician,55, 62–71.CrossRefGoogle Scholar
- Shafer, G. (1982). Lindley’s paradox.Journal of the American Statistical Association,77, 325–351.CrossRefGoogle Scholar
- Siegmund, D. (1985).Sequential analysis: Tests and confidence intervals. New York: Springer.Google Scholar
- Smith, A. F. M., &Spiegelhalter, D. J. (1980). Bayes factors and choice criteria for linear models.Journal of the Royal Statistical Society: Series B,42, 213–220.Google Scholar
- Stone, M. (1974). Cross-validatory choice and assessment of statistical predictions (with discussion).Journal of the Royal Statistical Society: Series B,36, 111–147.Google Scholar
- Strube, M. J. (2006). SNOOP: A program for demonstrating the consequences of premature and repeated null hypothesis testing.Behavior Research Methods,38, 24–27.PubMedGoogle Scholar
- Stuart, A., Ord, J. K., &Arnold, S. (1999).Kendall’s advanced theory of statistics: Vol. 2A. Classical inference and the linear model (6th ed.). London: Arnold.Google Scholar
- Trafimow, D. (2003). Hypothesis testing and theory evaluation at the boundaries: Surprising insights from Bayes’s theorem.Psychological Review,110, 526–535.PubMedCrossRefGoogle Scholar
- Vickers, D., Lee, M. D., Dry, M., &Hughes, P. (2003). The roles of the convex hull and the number of potential intersections in performance on visually presented traveling salesperson problems.Memory & Cognition,31, 1094–1104.Google Scholar
- Wagenmakers, E.-J. (2003). How many parameters does it take to fit an elephant? [Book review].Journal of Mathematical Psychology,47, 580–586.CrossRefGoogle Scholar
- Wagenmakers, E.-J., &Farrell, S. (2004). AIC model selection using Akaike weights.Psychonomic Bulletin & Review,11, 192–196.Google Scholar
- Wagenmakers, E.-J., &Grünwald, P. (2006). A Bayesian perspective on hypothesis testing: A comment on Killeen (2005).Psychological Science,17, 641–642.PubMedCrossRefGoogle Scholar
- Wagenmakers, E.-J., Grünwald, P., &Steyvers, M. (2006). Accumulative prediction error and the selection of time series models.Journal of Mathematical Psychology,50, 149–166.CrossRefGoogle Scholar
- Wagenmakers, E.-J., Ratcliff, R., Gomez, P., &Iverson, G. J. (2004). Assessing model mimicry using the parametric bootstrap.Journal of Mathematical Psychology,48, 28–50.PubMedCrossRefGoogle Scholar
- Wagenmakers, E.-J., & Waldorp, L. (Eds.) (2006). Model selection: Theoretical developments and applications [Special issue].Journal of Mathematical Psychology,50(2).Google Scholar
- Wainer, H. (1999). One cheer for null hypothesis significance testing.Psychological Methods,4, 212–213.CrossRefGoogle Scholar
- Wallace, C. S., &Dowe, D. L. (1999). Refinements of MDL and MML coding.Computer Journal,42, 330–337.CrossRefGoogle Scholar
- Ware, J. H. (1989). Investigating therapies of potentially great benefit: ECMO.Statistical Science,4, 298–340.CrossRefGoogle Scholar
- Wasserman, L. (2000). Bayesian model selection and model averaging.Journal of Mathematical Psychology,44, 92–107.PubMedCrossRefGoogle Scholar
- Wasserman, L. (2004).All of statistics: A concise course in statistical inference. New York: Springer.Google Scholar
- Weakliem, D. L. (1999). A critique of the Bayesian information criterion for model selection.Sociological Methods & Research,27, 359–397.CrossRefGoogle Scholar
- Wilkinson, L., &the Task Force on Statistical Inference (1999). Statistical methods in psychology journals: Guidelines and explanations.American Psychologist,54, 594–604.CrossRefGoogle Scholar
- Winship, C. (1999). Editor’s introduction to the special issue on the Bayesian information criterion.Sociological Methods & Research,27, 355–358.CrossRefGoogle Scholar
- Xie, Y. (1999). The tension between generality and accuracy.Sociological Methods & Research,27, 428–435.CrossRefGoogle Scholar