Skip to main content
Log in

The null hypothesis is not called that for nothing: statistical tests in randomized trials

  • Published:
Journal of Experimental Criminology Aims and scope Submit manuscript

Abstract

This article aims to update readers on different ways to arrange one’s thinking about conventional null hypotheses in randomized trials. It covers basic criticism of conventional hypotheses and, beyond this, covers relevant developments in methodological, organizational, and science policy arenas. This article includes coverage of new ways to frame null hypotheses, new technical resources, standards for registering trials and reporting on them, cumulating results, common mistakes, and post-trial analysis of null results. The paper includes ideas for research and development on each topic.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. A version of this paper was presented at the Fifth Annual (2003) Jerry Lee Criminology Colloquium at the University of Maryland. It is a delight to recognize the University’s and Jerry Lee’s support of this effort, and to David Weisburd’s and Larry Sherman’s organization of it. I am indebted to the Journal of Experimental Criminology (JEC) editor and to the anonymous reviewers for comments that improved this paper. I’ve benefited also from colleagues involved in IES What Works Clearinghouse at the U.S. Department of Education, American Institutes for Research, the Campbell Collaboration and the Cochrane Collaboration. Nobody should be blamed for errors of commission or omission in this paper, except the author.

References

  • Abelson, R. P. (1997). The surprising longevity of flogged horses: Why there is a case for the significance test. Psychological Science, 8, 12–15.

    Article  Google Scholar 

  • Altman, D. G., & Bland, J. M. (1995). Absence of evidence is not evidence of absence. BMJ, 311, 485.

    Google Scholar 

  • Benjamini, Y., & Hochberg, Y. (2000). On the adaptive control of the false discovery rate in multiple testing with independent statistics. Journal of Educational and Behavioral Statistics, 25(1), 60–83.

    Article  Google Scholar 

  • Benjamini, Y., & Yekutielli, D. (2001). The control of false discovery rate in multiple testing under dependency. Annals of Statistics, 29(4), 1165–1188.

    Article  Google Scholar 

  • Borenstein, M., Rothstein, H., Cohen, J., Schoenfeld, D., Berlin, J., & LaPeatos, E. (2001). Power and precision. Englewood, NJ: Biostat, Inc.

    Google Scholar 

  • Boruch, R. F. (1997). Randomized experiments for planning and evaluation. Thousand Oaks, California: Sage Publications.

    Google Scholar 

  • Boruch, R. F., Herman, R., Hitchcock, J., Song, M., & Maynard, R. (2005). What works clearinghouse technical review team tutorial on the mismatch problem (9-16-05). Washington, DC: What Works Clearinghouse http://w-w-c.org).

  • Braga, A. (2005). Hot spots policing: Registration and protocol for a Campbell Collaboration systematic review. http://campbellcollaboration.org.

  • Campbell, M. K., & Grimshaw, J. M. (1998). Cluster randomized trials: Time for improvement. BMJ, 317, 1171–1172, (31 October).

    Google Scholar 

  • Cohen (2005). Statistical Power Analysis for the Behavioral Sciences. Erlbaum.

  • CONSORT ( 2005) http://consort–statement.org.

  • Dixon, P. M. (1998). Assessing effect and no effect with equivalence tests. In M. C. Newman & C. L. Strojan (Eds). Risk assessment: Logic and measurement (p 275–301). Chelsea, Michigan: Ann Arbor Press.

    Google Scholar 

  • Donner, A., & Klar, N. (2000). Design and Analysis of Cluster Randomized Trials in Health Research. London: Arnold Publishing.

    Google Scholar 

  • Elliot, G., & Granger, C. W. J. (2004). Evaluating significance: Comments on “Size Matters”. Journal of Socio-Economics, 33, 547–550.

    Article  Google Scholar 

  • Fisher, Sir Ronald A. (1960). The Design of experiments. Seventh Edition. New York: Hafner Publishing Company.

    Google Scholar 

  • Flay, B. R., Biglan, A., Boruch, R. F., Castro, F. G., Gottfredson, D., Kellam, S., et al. (2005). Standards of Evidence: Efficacy, Effectiveness and Dissemination. Prevention Science, 1–25.

  • Freedman, D. (2003). From association to causation: Some remarks on the history of statistics. In J. Paraetos (Ed.) Statistical musings: Perspectives from the pioneers of the late twentieth centuries. Mahwah, NJ: Lawrence Erlbaum Associates, pp–45–71.

    Google Scholar 

  • Garrett, K. A. (1997). Use of statistical tests of equivalence (Bioequivalence Tests) in plant pathology. Phytopathology, 87(4), 372–374.

    Google Scholar 

  • Gigerenzer, G. (2004). Mindless statistics. Journal of Socio-Economics, 33, 587–606, (9 pages).

    Article  Google Scholar 

  • Harris, E. K. (1993). On p values and confidence intervals (Why can’t we p with more confidence?). Clinical Chemistry, 39, 927–927.

    Google Scholar 

  • Hedges, L. V. (2005). Correcting a significance test for clustering research report, prepared for the what works clearinghouse. Northwestern University, Evanston, Illinois.

  • Hedges, L. V., & Hedberg, E. C. (2006). Intraclass correlation values for planning group randomized trials in education. Report WP-06-12. Institute for Policy Research, Northwestern University, Evanston, Illinois http://www.northwestern.edu/ipt/pulbications).

  • Jones, L., & Tukey, J. W. (2000). A sensible formulation of the significance tests. Psychological Methods, 5(4), 411–414.

    Article  Google Scholar 

  • Kempthorne, O. (1952). The design and analysis of experiments. (First Edition). New York: John Wiley and Sons.

    Google Scholar 

  • Kempthorne, O. (1972). Theories of inference and data analysis. In T. A. Bancroft (Ed.). Statistical papers in honor of George W. Snedecor (pp 167–198). Ames, Iowa: Iowa State University Press.

    Google Scholar 

  • Killen, P. R. (2005). An alternative to null hypothesis significance tests. Psychological Science, 16(5), 345–353.

    Article  Google Scholar 

  • Krantz, D. H. (1999). The null hypothesis testing controversy in psychology. Journal of the American Statistical Association, 44(448), 1372–1381.

    Article  Google Scholar 

  • Lipsey, M. W. & Wilson, D. B. (2001). Practical meta-Analysis. Thousand Oaks, CA: Sage Publications.

    Google Scholar 

  • McCord, J. (2003). Cures that harm: Unanticipated outcomes of crime prevention programs. Annals of the American Academy of Political and Social Sciences, 587, 16–32.

    Article  Google Scholar 

  • Moher, D., Schulz, K. F., & Altman, D. G. (2001). The CONSORT Statement. Lancet, 1191–1194.

  • Murphy, K. R., & Myors, B. (2004). Statistical power analysis: A simple and general model for traditional and modern hypothesis testing. Mahwah, NJ: Lawrence Erlbaum Associates.

    Google Scholar 

  • Petrosino, A., Petrosino, C. T., & Buehler, J. (2005). Scared straight and other juvenile awareness programs for preventing juvenile delinquency: A Campbell Collaboration systematic review. http://campbellcollaboration.org.

  • Petrosino, A., Turpin-Petrosino, C., and Buehler, J. (2003). Scared straight and other juvenile awareness programs for preventing juvenile delinquency: A systematic review of randomized experimental evidence. Annals of the American Academy of Political and Social Sciences, 589, 41–62.

    Article  Google Scholar 

  • Prentice, R., Langer, R. D., Stefanick, C., Howard, B. V., Pettinger, M., Anderson, G. L., et al. (2005). Combined analysis of women’s health initiative observational and clinical trial data on postmenopausal hormone treatment and cardiovascular disease. American Journal of Epidemiology, 162, 404–414.

    Article  Google Scholar 

  • Rigby, A. S. (1999). Getting past the statistical referee: Moving away form P-values towards confidence interval estimation. Health Education Research, 14(6), 713–715.

    Article  Google Scholar 

  • Robinson, A. P., Duurisma, R. A., & Marshall, J. D. (2005). A regression based equivalence test for model validation: Shifting the burden of proof. Tree Physiology, 25, 903–913.

    Google Scholar 

  • Rozeboom, W. W. (1997). Good science is abductive, not hypothetico-deductive. In L. L. Harlow, S. A. Mulaik, & J. H. Steiger (Eds.). What if there were no significance tests? (p 335–392). Mahwah NJ and London UK: Lawrence Erlbaum Associates, Publishers.

    Google Scholar 

  • Schermer, M. (2006). Fake, Mistakes, Replicate. Scientific American, 295(3), 40.

    Article  Google Scholar 

  • Sherman, L. W. (2003). Misleading evidence and evidence led-policy: Making social science more experimental. Special Edition. Annuals of the American Academy of Political and Social Science, 589 (September 2003). Entire Issue.

  • Stigler, S. M. (1999). Statistics on the Table. Cambridge: Harvard University Press.

    Google Scholar 

  • Thompson, B. (2004). The “Significance” crisis in psychology and education. Journal of Socio-Economics, 33, 607–613.

    Article  Google Scholar 

  • Topal, E. J. (2004). Failing the public health-rofecoxib, Merck, and the FDA. New England Journal of Medicine, 351(17), 1707–1709.

    Article  Google Scholar 

  • Tukey, J. W. (1960). Conclusions vs. Decisions. Technometrics, 2, 423–433.

    Article  Google Scholar 

  • Wainer, H. (1999). One cheer for null hypothesis significance testing. Psychological Methods, 4(2), 212–213.

    Article  Google Scholar 

  • Weisburd (2005). Hot spots policing and criminal justice research: Lessons from the field. Annals of the American Academy of Political and Social Science, 599, 220–245, (May 2005).

    Article  Google Scholar 

  • Weisburd, D., Lum, C. M., & Yang, S. (2003). When can we conclude that treatments or programs “Don’t Work?”. Annals of the American Academy of Political and Social Sciences, 587, 31–48, (May).

    Article  Google Scholar 

  • Wellford, C. F., Pepper, J. V., & Petrie, C. V. Y. (Eds.) (2005). Firearms and violence: A critical review. Washington, DC: National Academies Press.

  • What Works Clearinghouse (2006). http://www.whatworks.ed.gov.

  • Wolins, L. (1982). Mistakes in statistics in the behavioral and social sciences. Ames, Iowa: Iowa State University Press.

    Google Scholar 

  • Wooldridge, J. M. (2004). Statistical significance is OK to: Comments on “Size Matters.” Journal of Socio-Economics, 33, 577–579.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Robert Boruch.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Boruch, R. The null hypothesis is not called that for nothing: statistical tests in randomized trials. J Exp Criminol 3, 1–20 (2007). https://doi.org/10.1007/s11292-007-9026-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11292-007-9026-0

Key words

Navigation