Abstract
This article aims to update readers on different ways to arrange one’s thinking about conventional null hypotheses in randomized trials. It covers basic criticism of conventional hypotheses and, beyond this, covers relevant developments in methodological, organizational, and science policy arenas. This article includes coverage of new ways to frame null hypotheses, new technical resources, standards for registering trials and reporting on them, cumulating results, common mistakes, and post-trial analysis of null results. The paper includes ideas for research and development on each topic.
Similar content being viewed by others
Notes
A version of this paper was presented at the Fifth Annual (2003) Jerry Lee Criminology Colloquium at the University of Maryland. It is a delight to recognize the University’s and Jerry Lee’s support of this effort, and to David Weisburd’s and Larry Sherman’s organization of it. I am indebted to the Journal of Experimental Criminology (JEC) editor and to the anonymous reviewers for comments that improved this paper. I’ve benefited also from colleagues involved in IES What Works Clearinghouse at the U.S. Department of Education, American Institutes for Research, the Campbell Collaboration and the Cochrane Collaboration. Nobody should be blamed for errors of commission or omission in this paper, except the author.
References
Abelson, R. P. (1997). The surprising longevity of flogged horses: Why there is a case for the significance test. Psychological Science, 8, 12–15.
Altman, D. G., & Bland, J. M. (1995). Absence of evidence is not evidence of absence. BMJ, 311, 485.
Benjamini, Y., & Hochberg, Y. (2000). On the adaptive control of the false discovery rate in multiple testing with independent statistics. Journal of Educational and Behavioral Statistics, 25(1), 60–83.
Benjamini, Y., & Yekutielli, D. (2001). The control of false discovery rate in multiple testing under dependency. Annals of Statistics, 29(4), 1165–1188.
Borenstein, M., Rothstein, H., Cohen, J., Schoenfeld, D., Berlin, J., & LaPeatos, E. (2001). Power and precision. Englewood, NJ: Biostat, Inc.
Boruch, R. F. (1997). Randomized experiments for planning and evaluation. Thousand Oaks, California: Sage Publications.
Boruch, R. F., Herman, R., Hitchcock, J., Song, M., & Maynard, R. (2005). What works clearinghouse technical review team tutorial on the mismatch problem (9-16-05). Washington, DC: What Works Clearinghouse http://w-w-c.org).
Braga, A. (2005). Hot spots policing: Registration and protocol for a Campbell Collaboration systematic review. http://campbellcollaboration.org.
Campbell, M. K., & Grimshaw, J. M. (1998). Cluster randomized trials: Time for improvement. BMJ, 317, 1171–1172, (31 October).
Cohen (2005). Statistical Power Analysis for the Behavioral Sciences. Erlbaum.
CONSORT ( 2005) http://consort–statement.org.
Dixon, P. M. (1998). Assessing effect and no effect with equivalence tests. In M. C. Newman & C. L. Strojan (Eds). Risk assessment: Logic and measurement (p 275–301). Chelsea, Michigan: Ann Arbor Press.
Donner, A., & Klar, N. (2000). Design and Analysis of Cluster Randomized Trials in Health Research. London: Arnold Publishing.
Elliot, G., & Granger, C. W. J. (2004). Evaluating significance: Comments on “Size Matters”. Journal of Socio-Economics, 33, 547–550.
Fisher, Sir Ronald A. (1960). The Design of experiments. Seventh Edition. New York: Hafner Publishing Company.
Flay, B. R., Biglan, A., Boruch, R. F., Castro, F. G., Gottfredson, D., Kellam, S., et al. (2005). Standards of Evidence: Efficacy, Effectiveness and Dissemination. Prevention Science, 1–25.
Freedman, D. (2003). From association to causation: Some remarks on the history of statistics. In J. Paraetos (Ed.) Statistical musings: Perspectives from the pioneers of the late twentieth centuries. Mahwah, NJ: Lawrence Erlbaum Associates, pp–45–71.
Garrett, K. A. (1997). Use of statistical tests of equivalence (Bioequivalence Tests) in plant pathology. Phytopathology, 87(4), 372–374.
Gigerenzer, G. (2004). Mindless statistics. Journal of Socio-Economics, 33, 587–606, (9 pages).
Harris, E. K. (1993). On p values and confidence intervals (Why can’t we p with more confidence?). Clinical Chemistry, 39, 927–927.
Hedges, L. V. (2005). Correcting a significance test for clustering research report, prepared for the what works clearinghouse. Northwestern University, Evanston, Illinois.
Hedges, L. V., & Hedberg, E. C. (2006). Intraclass correlation values for planning group randomized trials in education. Report WP-06-12. Institute for Policy Research, Northwestern University, Evanston, Illinois http://www.northwestern.edu/ipt/pulbications).
Jones, L., & Tukey, J. W. (2000). A sensible formulation of the significance tests. Psychological Methods, 5(4), 411–414.
Kempthorne, O. (1952). The design and analysis of experiments. (First Edition). New York: John Wiley and Sons.
Kempthorne, O. (1972). Theories of inference and data analysis. In T. A. Bancroft (Ed.). Statistical papers in honor of George W. Snedecor (pp 167–198). Ames, Iowa: Iowa State University Press.
Killen, P. R. (2005). An alternative to null hypothesis significance tests. Psychological Science, 16(5), 345–353.
Krantz, D. H. (1999). The null hypothesis testing controversy in psychology. Journal of the American Statistical Association, 44(448), 1372–1381.
Lipsey, M. W. & Wilson, D. B. (2001). Practical meta-Analysis. Thousand Oaks, CA: Sage Publications.
McCord, J. (2003). Cures that harm: Unanticipated outcomes of crime prevention programs. Annals of the American Academy of Political and Social Sciences, 587, 16–32.
Moher, D., Schulz, K. F., & Altman, D. G. (2001). The CONSORT Statement. Lancet, 1191–1194.
Murphy, K. R., & Myors, B. (2004). Statistical power analysis: A simple and general model for traditional and modern hypothesis testing. Mahwah, NJ: Lawrence Erlbaum Associates.
Petrosino, A., Petrosino, C. T., & Buehler, J. (2005). Scared straight and other juvenile awareness programs for preventing juvenile delinquency: A Campbell Collaboration systematic review. http://campbellcollaboration.org.
Petrosino, A., Turpin-Petrosino, C., and Buehler, J. (2003). Scared straight and other juvenile awareness programs for preventing juvenile delinquency: A systematic review of randomized experimental evidence. Annals of the American Academy of Political and Social Sciences, 589, 41–62.
Prentice, R., Langer, R. D., Stefanick, C., Howard, B. V., Pettinger, M., Anderson, G. L., et al. (2005). Combined analysis of women’s health initiative observational and clinical trial data on postmenopausal hormone treatment and cardiovascular disease. American Journal of Epidemiology, 162, 404–414.
Rigby, A. S. (1999). Getting past the statistical referee: Moving away form P-values towards confidence interval estimation. Health Education Research, 14(6), 713–715.
Robinson, A. P., Duurisma, R. A., & Marshall, J. D. (2005). A regression based equivalence test for model validation: Shifting the burden of proof. Tree Physiology, 25, 903–913.
Rozeboom, W. W. (1997). Good science is abductive, not hypothetico-deductive. In L. L. Harlow, S. A. Mulaik, & J. H. Steiger (Eds.). What if there were no significance tests? (p 335–392). Mahwah NJ and London UK: Lawrence Erlbaum Associates, Publishers.
Schermer, M. (2006). Fake, Mistakes, Replicate. Scientific American, 295(3), 40.
Sherman, L. W. (2003). Misleading evidence and evidence led-policy: Making social science more experimental. Special Edition. Annuals of the American Academy of Political and Social Science, 589 (September 2003). Entire Issue.
Stigler, S. M. (1999). Statistics on the Table. Cambridge: Harvard University Press.
Thompson, B. (2004). The “Significance” crisis in psychology and education. Journal of Socio-Economics, 33, 607–613.
Topal, E. J. (2004). Failing the public health-rofecoxib, Merck, and the FDA. New England Journal of Medicine, 351(17), 1707–1709.
Tukey, J. W. (1960). Conclusions vs. Decisions. Technometrics, 2, 423–433.
Wainer, H. (1999). One cheer for null hypothesis significance testing. Psychological Methods, 4(2), 212–213.
Weisburd (2005). Hot spots policing and criminal justice research: Lessons from the field. Annals of the American Academy of Political and Social Science, 599, 220–245, (May 2005).
Weisburd, D., Lum, C. M., & Yang, S. (2003). When can we conclude that treatments or programs “Don’t Work?”. Annals of the American Academy of Political and Social Sciences, 587, 31–48, (May).
Wellford, C. F., Pepper, J. V., & Petrie, C. V. Y. (Eds.) (2005). Firearms and violence: A critical review. Washington, DC: National Academies Press.
What Works Clearinghouse (2006). http://www.whatworks.ed.gov.
Wolins, L. (1982). Mistakes in statistics in the behavioral and social sciences. Ames, Iowa: Iowa State University Press.
Wooldridge, J. M. (2004). Statistical significance is OK to: Comments on “Size Matters.” Journal of Socio-Economics, 33, 577–579.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Boruch, R. The null hypothesis is not called that for nothing: statistical tests in randomized trials. J Exp Criminol 3, 1–20 (2007). https://doi.org/10.1007/s11292-007-9026-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11292-007-9026-0