The null hypothesis is not called that for nothing: statistical tests in randomized trials

Boruch, Robert

doi:10.1007/s11292-007-9026-0

The null hypothesis is not called that for nothing: statistical tests in randomized trials

Published: 20 February 2007

Volume 3, pages 1–20, (2007)
Cite this article

Journal of Experimental Criminology Aims and scope Submit manuscript

Robert Boruch¹

371 Accesses
12 Citations
Explore all metrics

Abstract

This article aims to update readers on different ways to arrange one’s thinking about conventional null hypotheses in randomized trials. It covers basic criticism of conventional hypotheses and, beyond this, covers relevant developments in methodological, organizational, and science policy arenas. This article includes coverage of new ways to frame null hypotheses, new technical resources, standards for registering trials and reporting on them, cumulating results, common mistakes, and post-trial analysis of null results. The paper includes ideas for research and development on each topic.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

A version of this paper was presented at the Fifth Annual (2003) Jerry Lee Criminology Colloquium at the University of Maryland. It is a delight to recognize the University’s and Jerry Lee’s support of this effort, and to David Weisburd’s and Larry Sherman’s organization of it. I am indebted to the Journal of Experimental Criminology (JEC) editor and to the anonymous reviewers for comments that improved this paper. I’ve benefited also from colleagues involved in IES What Works Clearinghouse at the U.S. Department of Education, American Institutes for Research, the Campbell Collaboration and the Cochrane Collaboration. Nobody should be blamed for errors of commission or omission in this paper, except the author.

References

Abelson, R. P. (1997). The surprising longevity of flogged horses: Why there is a case for the significance test. Psychological Science, 8, 12–15.
Article Google Scholar
Altman, D. G., & Bland, J. M. (1995). Absence of evidence is not evidence of absence. BMJ, 311, 485.
Google Scholar
Benjamini, Y., & Hochberg, Y. (2000). On the adaptive control of the false discovery rate in multiple testing with independent statistics. Journal of Educational and Behavioral Statistics, 25(1), 60–83.
Article Google Scholar
Benjamini, Y., & Yekutielli, D. (2001). The control of false discovery rate in multiple testing under dependency. Annals of Statistics, 29(4), 1165–1188.
Article Google Scholar
Borenstein, M., Rothstein, H., Cohen, J., Schoenfeld, D., Berlin, J., & LaPeatos, E. (2001). Power and precision. Englewood, NJ: Biostat, Inc.
Google Scholar
Boruch, R. F. (1997). Randomized experiments for planning and evaluation. Thousand Oaks, California: Sage Publications.
Google Scholar
Boruch, R. F., Herman, R., Hitchcock, J., Song, M., & Maynard, R. (2005). What works clearinghouse technical review team tutorial on the mismatch problem (9-16-05). Washington, DC: What Works Clearinghouse http://w-w-c.org).
Braga, A. (2005). Hot spots policing: Registration and protocol for a Campbell Collaboration systematic review. http://campbellcollaboration.org.
Campbell, M. K., & Grimshaw, J. M. (1998). Cluster randomized trials: Time for improvement. BMJ, 317, 1171–1172, (31 October).
Google Scholar
Cohen (2005). Statistical Power Analysis for the Behavioral Sciences. Erlbaum.
CONSORT ( 2005) http://consort–statement.org.
Dixon, P. M. (1998). Assessing effect and no effect with equivalence tests. In M. C. Newman & C. L. Strojan (Eds). Risk assessment: Logic and measurement (p 275–301). Chelsea, Michigan: Ann Arbor Press.
Google Scholar
Donner, A., & Klar, N. (2000). Design and Analysis of Cluster Randomized Trials in Health Research. London: Arnold Publishing.
Google Scholar
Elliot, G., & Granger, C. W. J. (2004). Evaluating significance: Comments on “Size Matters”. Journal of Socio-Economics, 33, 547–550.
Article Google Scholar
Fisher, Sir Ronald A. (1960). The Design of experiments. Seventh Edition. New York: Hafner Publishing Company.
Google Scholar
Flay, B. R., Biglan, A., Boruch, R. F., Castro, F. G., Gottfredson, D., Kellam, S., et al. (2005). Standards of Evidence: Efficacy, Effectiveness and Dissemination. Prevention Science, 1–25.
Freedman, D. (2003). From association to causation: Some remarks on the history of statistics. In J. Paraetos (Ed.) Statistical musings: Perspectives from the pioneers of the late twentieth centuries. Mahwah, NJ: Lawrence Erlbaum Associates, pp–45–71.
Google Scholar
Garrett, K. A. (1997). Use of statistical tests of equivalence (Bioequivalence Tests) in plant pathology. Phytopathology, 87(4), 372–374.
Google Scholar
Gigerenzer, G. (2004). Mindless statistics. Journal of Socio-Economics, 33, 587–606, (9 pages).
Article Google Scholar
Harris, E. K. (1993). On p values and confidence intervals (Why can’t we p with more confidence?). Clinical Chemistry, 39, 927–927.
Google Scholar
Hedges, L. V. (2005). Correcting a significance test for clustering research report, prepared for the what works clearinghouse. Northwestern University, Evanston, Illinois.
Hedges, L. V., & Hedberg, E. C. (2006). Intraclass correlation values for planning group randomized trials in education. Report WP-06-12. Institute for Policy Research, Northwestern University, Evanston, Illinois http://www.northwestern.edu/ipt/pulbications).
Jones, L., & Tukey, J. W. (2000). A sensible formulation of the significance tests. Psychological Methods, 5(4), 411–414.
Article Google Scholar
Kempthorne, O. (1952). The design and analysis of experiments. (First Edition). New York: John Wiley and Sons.
Google Scholar
Kempthorne, O. (1972). Theories of inference and data analysis. In T. A. Bancroft (Ed.). Statistical papers in honor of George W. Snedecor (pp 167–198). Ames, Iowa: Iowa State University Press.
Google Scholar
Killen, P. R. (2005). An alternative to null hypothesis significance tests. Psychological Science, 16(5), 345–353.
Article Google Scholar
Krantz, D. H. (1999). The null hypothesis testing controversy in psychology. Journal of the American Statistical Association, 44(448), 1372–1381.
Article Google Scholar
Lipsey, M. W. & Wilson, D. B. (2001). Practical meta-Analysis. Thousand Oaks, CA: Sage Publications.
Google Scholar
McCord, J. (2003). Cures that harm: Unanticipated outcomes of crime prevention programs. Annals of the American Academy of Political and Social Sciences, 587, 16–32.
Article Google Scholar
Moher, D., Schulz, K. F., & Altman, D. G. (2001). The CONSORT Statement. Lancet, 1191–1194.
Murphy, K. R., & Myors, B. (2004). Statistical power analysis: A simple and general model for traditional and modern hypothesis testing. Mahwah, NJ: Lawrence Erlbaum Associates.
Google Scholar
Petrosino, A., Petrosino, C. T., & Buehler, J. (2005). Scared straight and other juvenile awareness programs for preventing juvenile delinquency: A Campbell Collaboration systematic review. http://campbellcollaboration.org.
Petrosino, A., Turpin-Petrosino, C., and Buehler, J. (2003). Scared straight and other juvenile awareness programs for preventing juvenile delinquency: A systematic review of randomized experimental evidence. Annals of the American Academy of Political and Social Sciences, 589, 41–62.
Article Google Scholar
Prentice, R., Langer, R. D., Stefanick, C., Howard, B. V., Pettinger, M., Anderson, G. L., et al. (2005). Combined analysis of women’s health initiative observational and clinical trial data on postmenopausal hormone treatment and cardiovascular disease. American Journal of Epidemiology, 162, 404–414.
Article Google Scholar
Rigby, A. S. (1999). Getting past the statistical referee: Moving away form P-values towards confidence interval estimation. Health Education Research, 14(6), 713–715.
Article Google Scholar
Robinson, A. P., Duurisma, R. A., & Marshall, J. D. (2005). A regression based equivalence test for model validation: Shifting the burden of proof. Tree Physiology, 25, 903–913.
Google Scholar
Rozeboom, W. W. (1997). Good science is abductive, not hypothetico-deductive. In L. L. Harlow, S. A. Mulaik, & J. H. Steiger (Eds.). What if there were no significance tests? (p 335–392). Mahwah NJ and London UK: Lawrence Erlbaum Associates, Publishers.
Google Scholar
Schermer, M. (2006). Fake, Mistakes, Replicate. Scientific American, 295(3), 40.
Article Google Scholar
Sherman, L. W. (2003). Misleading evidence and evidence led-policy: Making social science more experimental. Special Edition. Annuals of the American Academy of Political and Social Science, 589 (September 2003). Entire Issue.
Stigler, S. M. (1999). Statistics on the Table. Cambridge: Harvard University Press.
Google Scholar
Thompson, B. (2004). The “Significance” crisis in psychology and education. Journal of Socio-Economics, 33, 607–613.
Article Google Scholar
Topal, E. J. (2004). Failing the public health-rofecoxib, Merck, and the FDA. New England Journal of Medicine, 351(17), 1707–1709.
Article Google Scholar
Tukey, J. W. (1960). Conclusions vs. Decisions. Technometrics, 2, 423–433.
Article Google Scholar
Wainer, H. (1999). One cheer for null hypothesis significance testing. Psychological Methods, 4(2), 212–213.
Article Google Scholar
Weisburd (2005). Hot spots policing and criminal justice research: Lessons from the field. Annals of the American Academy of Political and Social Science, 599, 220–245, (May 2005).
Article Google Scholar
Weisburd, D., Lum, C. M., & Yang, S. (2003). When can we conclude that treatments or programs “Don’t Work?”. Annals of the American Academy of Political and Social Sciences, 587, 31–48, (May).
Article Google Scholar
Wellford, C. F., Pepper, J. V., & Petrie, C. V. Y. (Eds.) (2005). Firearms and violence: A critical review. Washington, DC: National Academies Press.
What Works Clearinghouse (2006). http://www.whatworks.ed.gov.
Wolins, L. (1982). Mistakes in statistics in the behavioral and social sciences. Ames, Iowa: Iowa State University Press.
Google Scholar
Wooldridge, J. M. (2004). Statistical significance is OK to: Comments on “Size Matters.” Journal of Socio-Economics, 33, 577–579.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Center for Research and Evaluation in Social Policy (CRESP) and The Campbell Collaboration (C2), University of Pennsylvania Graduate School of Education, 3700 Walnut Street, Philadelphia, PA, 19104-6216, USA
Robert Boruch

Authors

Robert Boruch
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Robert Boruch.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Boruch, R. The null hypothesis is not called that for nothing: statistical tests in randomized trials. J Exp Criminol 3, 1–20 (2007). https://doi.org/10.1007/s11292-007-9026-0

Download citation

Published: 20 February 2007
Issue Date: March 2007
DOI: https://doi.org/10.1007/s11292-007-9026-0

Key words

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The null hypothesis is not called that for nothing: statistical tests in randomized trials

Abstract

Access this article

Similar content being viewed by others

Design and Analysis of Experiments

The Virtues and Limitations of Randomized Experiments

Blinding and Randomization

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Key words

Navigation

The null hypothesis is not called that for nothing: statistical tests in randomized trials

Abstract

Access this article

Similar content being viewed by others

Design and Analysis of Experiments

The Virtues and Limitations of Randomized Experiments

Blinding and Randomization

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Key words

Search

Navigation