Type M Error Might Explain Weisburd’s Paradox



Simple calculations seem to show that larger studies should have higher statistical power, but empirical meta-analyses of published work in criminology have found zero or weak correlations between sample size and estimated statistical power. This is “Weisburd’s paradox” and has been attributed by Weisburd et al. (in Crime Justice 17:337–379, 1993) to a difficulty in maintaining quality control as studies get larger, and attributed by Nelson et al. (in J Exp Criminol 11:141–163, 2015) to a negative correlation between sample sizes and the underlying sizes of the effects being measured. We argue against the necessity of both these explanations, instead suggesting that the apparent Weisburd paradox might be explainable as an artifact of systematic overestimation inherent in post-hoc power calculations, a bias that is large with small N.


We discuss Weisburd’s paradox in light of the concepts of type S and type M errors, and re-examine the publications used in previous studies of the so-called paradox.


We suggest that the apparent Weisburd paradox might be explainable as an artifact of systematic overestimation inherent in post-hoc power calculations, a bias that is large with small N.


Speaking more generally, we recommend abandoning the use of statistical power as a measure of the strength of a study, because implicit in the definition of power is the bad idea of statistical significance as a research goal.

This is a preview of subscription content, access via your institution.

Fig. 1


  1. 1.

    Our summary is at http://www.stat.columbia.edu/~gelman/documents/weisburd_table_of_studies.pdf.


  1. Brame R, Bushway S, Paternoster R, Turner M (2014) Demographic patterns of cumulative arrest prevalence by ages 18 and 23. Crime Delinquency 60:471–486

    Article  Google Scholar 

  2. Button KS, Ioannidis JPA, Mokrysz C, Nosek BA, Flint J, Robinson ESJ, Munafo MR (2013) Power failure: why small sample size undermines the reliability of neuroscience. Nat Rev Neurosci 14:365–376

    Article  Google Scholar 

  3. Carroll KM, Easton CJ, Nich C, Hunkele KA, Neavins TM, Sinha R, Ford HL, Vitolo SA, Doebrick CA, Rounsaville BJ (2006) The use of contingency management and motivational/skills-building therapy to treat young adults with marijuana dependence. J Consult Clin Psychol 74:955–966

    Article  Google Scholar 

  4. Carroll KM, Martino S, Ball SA, Nich C, Frankforter T, Anez LM, Paris M, Suarez-Morales L, Szapocznik J, Miller WR, Rosa C, Matthews J, Farentinos C (2009) A multi-site randomised effectiveness trial of motivational enhancement therapy for Spanish-speaking substance users. J Consult Clin Psychol 77(5):993–999

    Article  Google Scholar 

  5. Deschenes EP, Turner S, Greenwood PW (1995) Drug court or probation?: An experimental evaluation of Maricopa County’s drug court. Justice Syst J 18:55–73

    Article  Google Scholar 

  6. Franco A, Malhotra N, Simonovits G (2014) Publication bias in the social sciences: unlocking the file drawer. Science 345:1502–1505

    Article  Google Scholar 

  7. Gerber AS, Malhotra N (2008a) Publication bias in empirical sociological research: Do arbitrary significance levels distort published results? Sociol Methods Res 37:3–30

    Article  Google Scholar 

  8. Gerber AS, Malhotra N (2008b) Do statistical reporting standards affect what is published? Publication bias in two leading political science journals. Q J Polit Sci 3:313–326

    Article  Google Scholar 

  9. Gelman A (2015) Statistics and the crisis of scientific replication. Significance 12(3):23–25

    Article  Google Scholar 

  10. Gelman A, Carlin JB (2014) Beyond power calculations: Assessing Type S (sign) and Type M (magnitude) errors. Perspect Psychol Sci 9:641–651

    Article  Google Scholar 

  11. Gelman A, Loken E (2014) The statistical crisis in science. Am Sci 102:460–465

    Article  Google Scholar 

  12. Gelman A, Tuerlinckx F (2000) Type S error rates for classical and Bayesian single and multiple comparison procedures. Comput Stat 15:373–390

    Article  Google Scholar 

  13. Ginsel B, Aggarwal A, Xuan W, Harris I (2015) The distribution of probability values in medical abstracts: an observational study. BMC Res Notes 8:721

    Article  Google Scholar 

  14. Jager LR, Leek JT (2014) An estimate of the science-wise false discovery rate and application to the top medical literature. Biostatistics 15:1–12

    Article  Google Scholar 

  15. Lewis RV (1983) Scared straight—California style. Evaluation of the San Quentin squires program. Crim Justice Behav 10:209–226

    Article  Google Scholar 

  16. Masicampo EJ, Lalande D (2012) A peculiar prevalence of p values just below.05. Q J Exp Psychol 65:2271–2279

    Article  Google Scholar 

  17. Nelson MS, Wooditch A, Dario LM (2015) Sample size, effect size, and statistical power: a replication study of Weisburd’s paradox. J Exp Criminol 11:141–163

    Article  Google Scholar 

  18. Patrick S, Marsh R (2005) Juvenile diversion: results from a 3-year experimental study. Crim Justice Policy Rev 16:59–73

    Article  Google Scholar 

  19. Piquero AR, Jennings WG, Diamond B, Farrington DP, Tremblay RE, Welsh BC, Reingle Gonzalez JM (2016) A meta-analysis update on the effects of early family/parent training programs on antisocial behavior and delinquency. J Exp Criminol 12:229–248

    Article  Google Scholar 

  20. Rothstein H (2008) Publication bias as a threat to the validity of meta-analytic results. J Exp Criminol 4:61–81

    Article  Google Scholar 

  21. Senn SJ (2002) Power is indeed irrelevant in interpreting completed studies. Br Med J 325:1304

    Article  Google Scholar 

  22. Sherman LW (2007) The power few: experimental criminology and the reduction of harm. The 2006 Joan McCord Prize Lecture. J Exp Criminol 3:299–321

    Article  Google Scholar 

  23. Simmons JP, Nelson LD, Simonsohn U (2011) False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychol Sci 22:1359–1366

    Article  Google Scholar 

  24. Slavin R, Smith D (2009) The relationship between sample sizes and effect sizes in systematic reviews in education. Educ Eval Policy An 31:500–506

    Article  Google Scholar 

  25. Weisburd D, Petrosino A, Mason G (1993) Design sensitivity in criminal justice experiments. Crime Justice 17:337–379

    Article  Google Scholar 

  26. Wilson SJ, Tanner-Smith EE, Lipsey MW, Steinka-Fry K, Morrison J (2011) Dropout prevention and intervention programs: effects on school completion and dropout among school-aged children and youth. Campbell Systematic Reviews 2011:8. Oslo: The Campbell Collaboration

Download references


National Science Foundation (Grant No. SES-1534414), Institute of Education Sciences (Grant No. R305D140059-16), Office of Naval Research (Grant No. N00014-15-1-2541), Defense Advanced Research Projects Agency (Grant No. DARPA BAA-16-32)

Author information



Corresponding author

Correspondence to Andrew Gelman.

Additional information

We thank Justin Pickett and Gary Sweeten for suggesting this topic, several reviewers for helpful comments, and the U.S. National Science Foundation, Institute of Education Sciences, Office of Naval Research, and Defense Advanced Research Projects Agency for partial support of this work.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Gelman, A., Skardhamar, T. & Aaltonen, M. Type M Error Might Explain Weisburd’s Paradox. J Quant Criminol 36, 295–304 (2020). https://doi.org/10.1007/s10940-017-9374-5

Download citation


  • Weisburd paradox
  • Type M error
  • Statistical power
  • Publication bias