Biological Theory

, Volume 6, Issue 2, pp 154–161 | Cite as

Randomization and Rules for Causal Inferences in Biology: When the Biological Emperor (Significance Testing) Has No Clothes

Long Article


Why do classic biostatistical studies, alleged to provide causal explanations of effects, often fail? This article argues that in statistics-relevant areas of biology—such as epidemiology, population biology, toxicology, and vector ecology—scientists often misunderstand epistemic constraints on use of the statistical-significance rule (SSR). As a result, biologists often make faulty causal inferences. The paper (1) provides several examples of faulty causal inferences that rely on tests of statistical significance; (2) uncovers the flawed theoretical assumptions, especially those related to randomization, that likely contribute to flawed biostatistics; (3) re-assesses the three classic (SSR-warrant, avoiding-selection-bias, and avoiding-confounders) arguments for using SSR only with randomization; and (4) offers five new reasons for biologists to use SSR only with randomized experiments.


Alcohol Biostatistics Causal inference Epidemiology Experimental study Observational study Randomization Statistics Tobacco 


  1. Alcohol Beverage Medical Research Foundation (ABMRF) (2011) Accessed 24 Jan 2011
  2. Anderson G (2001) Genomic instability in cancer. Curr Sci 81(5):501–550Google Scholar
  3. Banebrake TC, Christensen J, Boggs CL, Ehrlich PR (2010) Population decline assessment, historical baselines, and conservation. Conserv Lett 3(6):371–378CrossRefGoogle Scholar
  4. Bath PM, Gray LJ (2005) Association between hormone replacement therapy and subsequent stroke: a meta-analysis. Br Med J 330(7487):342CrossRefGoogle Scholar
  5. Beauchamp T, Childless J (1989) Principles of biomedical ethics. Oxford University Press, New YorkGoogle Scholar
  6. Benson K, Hartz AJ (2000) A comparison of observational studies and randomized controlled trials. N Engl J Med 342:1878CrossRefGoogle Scholar
  7. Box JF (1978) R. A. Fisher: the life of a scientist. Wiley, New YorkGoogle Scholar
  8. Byar DP, Simon RM, Friedewald WT, Schlesselman JJ, DeMets DL, Ellenberg JH, Gail MH, Ware JH (1976) Randomized clinical trials—perspectives on some recent ideas. N Engl J Med 295:74–80CrossRefGoogle Scholar
  9. Carpenter D (2010) Reputation and power. Princeton University Press, PrincetonGoogle Scholar
  10. Cartwright N (1994) Nature’s capacities and their measurement. Clarendon Press, OxfordCrossRefGoogle Scholar
  11. Cartwright N (2007a) Are RCTs the gold standard? Biosocieties 2(1):11–20CrossRefGoogle Scholar
  12. Cartwright N (2007b) Hunting causes and using them. Cambridge University Press, CambridgeCrossRefGoogle Scholar
  13. Cartwright N, Munro E (2010) The limitations of randomized controlled trials in predicting effectiveness. J Eval Clin Pract 16(2):260–266CrossRefGoogle Scholar
  14. Collins MA, Neafsey EJ, Mukamal KJ, Gray MO, Parks DA, Das DK, Korthuis RJ (2009) Alcohol in moderation, cardioprotection, and neuroprotection: epidemiological considerations and mechanistic studies. Alcohol Clin Exp Res 33:1–14CrossRefGoogle Scholar
  15. Concato J, Shah N, Horwitz RI (2000) Randomized controlled trials, observational studies, and the hierarchy of research designs. N Engl J Med 342:1887CrossRefGoogle Scholar
  16. Cranor C (2006) Toxic torts. Cambridge University Press, New York, pp 240–241CrossRefGoogle Scholar
  17. Davidoff F, DeAngelis CD, Drazen JM, Hoey J, Højgaard L, Horton R, Kotzin S, Nicholls MG, Nylenna M, Overbeke AJPM, Sox HC, Van Der Weyden MB, Wilkes MS (2001) Sponsorship, authorship, and accountability. N Engl J Med 345:825–826CrossRefGoogle Scholar
  18. Dietl G, Durham S, Kelley P (2010) Shell repair as a reliable indicator of bivalve predation by shell-wedging gastropods in the fossil record. Palaeogeogr Palaeoclimatol Palaeoecol 296(1/2):174–184CrossRefGoogle Scholar
  19. Eberhardt F, Scheines R (2007) Interventions and causal inference. Philos Sci 74(5):981–995CrossRefGoogle Scholar
  20. Ellison RC (2005) Importance of pattern of alcohol consumption. Circulation 112:3818–3819CrossRefGoogle Scholar
  21. Empana JP, Ducimetière P, Arveiler D, Ferrières J, Evans A, Ruidavets JB, Haas B, Yarnell J, Bingham A, Amouyel P, Dallongeville J, PRIME Study Group (2003) Are the Framingham and PROCAM coronary heart disease risk functions applicable to different European populations? The PRIME Study. Eur Heart J 24:1903–1911CrossRefGoogle Scholar
  22. Ferrara A, Quesenberry CP, Karter AJ, Njoroge CW, Jacobson AS, Selby JV, Northern California Kaiser Permanente Diabetes Registry (2003) Current use of unopposed estrogen and estrogen plus progestin and the risk of acute myocardial infarction among women with diabetes: the Northern California Kaiser Permanente Diabetes Registry, 1995–1998. Circulation 107:43–48CrossRefGoogle Scholar
  23. Fidler F, Burgman M, Cumming G, Buttrose R, Thomason N (2006) Impact of criticism of null-hypothesis significance testing on statistical reporting practices in conservation biology. Conserv Biol 20(5):1539–1544CrossRefGoogle Scholar
  24. Fillit HM (2002) The role of hormone replacement therapy in the prevention of alzheimer disease. Arch Intern Med 162:1934–1942CrossRefGoogle Scholar
  25. Fillmore KM (2000) Is alcohol really good for the heart? Addiction 95(2):173–174Google Scholar
  26. Fillmore KM and Kerr W (2002) A Bostrom abstinence from alcohol and mortality risk in prospective studies. Nordic Stud Alcohol Drugs 19(4):295–296Google Scholar
  27. Fisher RA (1925) Statistical methods for research workers. Oliver and Boyd, EdinburghGoogle Scholar
  28. Fisher RA (1947) The design of experiments. Oliver and Boyd, EdinburghGoogle Scholar
  29. Foot P (1978) Virtues and vices and other essays in moral philosophy. University of California Press, Berkeley; Blackwell, OxfordGoogle Scholar
  30. Garamszegi LZ, Calhil S, Dochtermann N, Hegyi G, Hurd PL, Jorgensen C, Kutsukake N, Lajeunesse MJ, Pollard KA, Schielzeth H, Symonds MRE, Nakagawa S (2009) Changing philosophies and tools for statistical inferences in behavioral ecology. Behav Ecol 20(6):1363–1375CrossRefGoogle Scholar
  31. Gerrodette T (2011) Inference without significance. Marine Ecol 32:404–418CrossRefGoogle Scholar
  32. Giere RN (1979) Understanding scientific reasoning. Holt Rinehart and Winston, New York, p 296Google Scholar
  33. González AL, Fariña JM, Kay AD, Pinto R, Marquet PA (2011) Exploring patterns and mechanisms of interspecific and intraspecific variation in body elemental composition of desert consumers. Oikos 120:1247–1255CrossRefGoogle Scholar
  34. Gore SM (1981) Assessing clinical trials: why randomize? Br Med J 282:1958CrossRefGoogle Scholar
  35. Grage TB, Zelen M (1982) The controlled randomized trial in the evaluation of cancer treatment. UICC Tech Rep Ser 70:23–47Google Scholar
  36. Greenland S (1990) Randomization, statistics, and causal inference. Epidemiology 1(6):421–429CrossRefGoogle Scholar
  37. Greenland S (1977) Response and follow-up bias in cohort studies. Am J Epidemiol 106:184Google Scholar
  38. Grodstein F, Manson JE, Colditz GA, Willett WC, Speizer FE, Stampfer MJ (2000) A prospective, observational study of postmenopausal hormone therapy and primary prevention of cardiovascular disease. Ann Intern Med 133:933–941Google Scholar
  39. Hacking I (1990) The taming of chance. Cambridge University Press, New YorkGoogle Scholar
  40. Hall NS (2007) R. A. Fisher and his advocacy of randomization. J Hist Biol 40:295–325CrossRefGoogle Scholar
  41. Hart M, Marko P (2010) It’s about time: divergence, demography, and the evolution of developmental modes in marine invertebrates. Integr Comp Biol 50(4):643CrossRefGoogle Scholar
  42. Howson C, Urbach PM (1993) Scientific reasoning: the Bayesian approach. Open Court, PeruGoogle Scholar
  43. Hurtig AK, San Sebastián M (2005) Epidemiology vs epidemiology: the case of oil exploitation in the Amazon basin of Ecuador. Int J Epidemiol 34(5):1170–1172CrossRefGoogle Scholar
  44. International Agency for Research on Cancer (IARC) (2008) Monograph: overall evaluations of carcinogenicity to humans, Supplement 7, 2008. Accessed 27 Aug 2008
  45. Kadane JB, Seidenfeld T (1990) Randomization in a Bayesian perspective. J Stat Plan Inference 25:329–345CrossRefGoogle Scholar
  46. Kelsh M, Morimoto L, Lau E (2008) Cancer mortality and oil production in the Amazon region of Ecuador. Int Arch Occup Environ Med 82(3):381–395CrossRefGoogle Scholar
  47. Klatsky AL (1996) Alcohol and hypertension. Clin Chim Acta 246:91–105CrossRefGoogle Scholar
  48. Knutsen H, Olsen EM, Jorde PE, Espeland SH, André C, Stenseth NC (2011) Are low but statistically significant levels of genetic differentiation in marine fishes “biologically meaningful”? A case study of coastal Atlantic cod. Mol Ecol 20(4):768–783CrossRefGoogle Scholar
  49. Koizumi K (2005) R&D trends and special analyses, AAAS Report XXIX, XXVII. AAAS, WashingtonGoogle Scholar
  50. Krimsky S (2003) Science in the private interest. Rowman and Littlefield, LanhamGoogle Scholar
  51. La Caze A, Djulbegovic B, Senn S (2011) What does randomization achieve? Evidence Based Med. doi:10.1136/ebm.2011.100061
  52. Legendre P, Legendre L (2011) Numerical ecology. Elsevier, AmsterdamGoogle Scholar
  53. Lexchin J, Bero L, Djulbegovic B, Clark O (2003) Pharmaceutical industry sponsorship and research outcome. Br Med J 326(7400):1167–1170CrossRefGoogle Scholar
  54. MacNeil MA (2008) Making empirical progress in observational ecology. Environ Conserv 35(3):193–196CrossRefGoogle Scholar
  55. Mariani S (2008) Through the explanatory process in natural history and ecology. Hist Philos Life Sci 30(2):159–178Google Scholar
  56. Mayo O (1987) Comments on “Randomization and the design of experiments” by P. Urbach. Philosophy of Science 54(4):592–596Google Scholar
  57. McConell MV, Vavouranakis I, Wu LL, Vaughan DE, Ridker PM (1997) Effects of a single, daily alcoholic beverage on lipid and hemostatic markers of cardiovascular risk. Am J Cardiol 80:1226–1228CrossRefGoogle Scholar
  58. McGarity T, Wagner W (2008) Bending science. Harvard University Press, CambridgeGoogle Scholar
  59. Mehner T, Freyhof J, Reichard M (2011) Summary and perspective on evolutionary ecology of fishes. Evol Ecol 25(3):547–556CrossRefGoogle Scholar
  60. Naessen T, Lindmark B, Lagerström C, Larsen HC, Persson I (2007) Early postmenopausal hormone therapy improves postural balance. Menopause 14:14–19CrossRefGoogle Scholar
  61. National Heart, Lung, and Blood Institute (NHLBI) (2009) Framingham Heart Study. NIH, BethesdaGoogle Scholar
  62. Nickerson RS (2000) Null hypothesis significance tests: a review of an old and continuing controversy. Psychol Methods 5(2):241–301CrossRefGoogle Scholar
  63. Okland RH (2007) Wise use of statistical tools in ecological field studies. Folia Geobot 42:123–140CrossRefGoogle Scholar
  64. O’Rourke D, Connolly S (2003) Just oil? The distribution of environmental and social impacts of oil production and consumption. Annu Rev Environ Resour 28:587–617CrossRefGoogle Scholar
  65. Papineau D (1994) The virtues of randomization. Br J Philos Sci 45:437–450CrossRefGoogle Scholar
  66. Pearl J (2000) Causality. Cambridge University Press, New YorkGoogle Scholar
  67. Peirce CS, Jastrow J (1885) On small differences in sensation. Mem Natl Acad Sci 3:73–83 (Reprinted in Burks AW (ed) (1958) Collected papers of Charles Sanders Peirce, vol 7. Harvard University Press, Cambridge, pp 13–34Google Scholar
  68. Peters RH (1991) A critique of ecology. Cambridge University Press, CambridgeGoogle Scholar
  69. Poole C, Rothman KJ (1998) Our conscientious objection to the epidemiology wars. J Epidemiol Community Health 52:612–618CrossRefGoogle Scholar
  70. Rimm E, Klatsky AL, Grobbee D, Stampfer MJ (1996) Review of moderate alcohol consumption and reduced risk of coronary heart disease: is the effect due to beer, wine, or spirits? Br Med J 312:731–736CrossRefGoogle Scholar
  71. Roizen R and Fillmore KM (2000) The coming crisis in alcohol social science, nordic studies on alcohol and drugs (English Supplement) 17:91–104Google Scholar
  72. Rothman KJ (1990) Statistics in non-randomized studies. Epidemiology 1:417–418CrossRefGoogle Scholar
  73. Roughgarden J (2009) Is there a general theory of community ecology. Biol Philos 24(4):521–529CrossRefGoogle Scholar
  74. Ruitenberg A, van Swieten J, Witteman J, Mehta K, van Duijn C, Hofman A, Breteler MMB (2002) Alcohol consumption and risk of dementia: the Rotterdam study. Lancet 359(9303):281CrossRefGoogle Scholar
  75. Savitz DA (2003) Interpreting epidemiologic evidence. Oxford University Press, New YorkCrossRefGoogle Scholar
  76. Schwartz D, Flamant R, Lellouch J (1980) Clinical trials. Academic Press, LondonGoogle Scholar
  77. Sherwin B (2000) Mild cognitive impairment: potential pharmacological treatment options. J Am Geriatr Soc 48(4):431–441Google Scholar
  78. Shrader-Frechette K (2004) Measurement problems and Florida panther models. Southeast Nat 3(1):37–50CrossRefGoogle Scholar
  79. Shrader-Frechette K (2008a) Statistical significance in biology: neither necessary nor sufficient for hypothesis-acceptance. Biol Theory 3(1):12–16Google Scholar
  80. Shrader-Frechette K (2008b) Evidentiary standards and animal data. Environ Justice 1(3):139–144Google Scholar
  81. Shrader-Frechette K (2011) Fighting climate change with renewable energy, not nuclear power. Oxford University Press, New YorkGoogle Scholar
  82. Shrader-Frechette K, McCoy ED (1993) Method in ecology. Cambridge University Press, CambridgeCrossRefGoogle Scholar
  83. Sillero N (2011) What does ecological modelling model? A proposed classification of ecological niche models based on their underlying method. Ecol Model 222(8):1343–1346CrossRefGoogle Scholar
  84. Simberloff D (1976) Species turnover and equilibrium island biogeography. Science 194:572–578CrossRefGoogle Scholar
  85. Simberloff D, Wilson EO (1970) Experimental zoogeography of islands. Ecology 51:934–937CrossRefGoogle Scholar
  86. Singer N (2009) Medical papers by ghostwriters pushed therapy. New York Times. 5 Aug 2009; Section A, p 1Google Scholar
  87. Sober E (1988) The principle of the common cause. In: Fetzer J (ed) Probability and causality: essays in honor of W. C. Salmon. Reidel, Boston, pp 211–228Google Scholar
  88. Strom BL, Schinnar R, Weber AL et al (2006) Case–control study of postmenopausal hormone replacement therapy and endometrial cancer. Am J Epidemiol 164:775–786. doi:10.1093/aje/kwj316 CrossRefGoogle Scholar
  89. Tannen RL, Weiner MG, Xie D, Barnhart K (2007) Estrogen affects postmenopausal women differently than estrogen plus progestin replacement therapy. Hum Reprod 22:1769–1777CrossRefGoogle Scholar
  90. Tukey JW (1977) Some thoughts on clinical trials. Science 198:684CrossRefGoogle Scholar
  91. University of California, San Francisco (UCSF) (2011) Drug industry document archive (Dida). Accessed 24 Jan 2011
  92. Urbach PM (1985) Randomization and the design of experiments. Philos Sci 52:256–273CrossRefGoogle Scholar
  93. Wing S (2003) Objectivity and ethics and environmental health science. Environ Health Perspect 111(14):1809–1818CrossRefGoogle Scholar
  94. Woodward J (2006) Invariance, explanation, and understanding. Metascience 15:56–57Google Scholar
  95. Worrall J (2002) What evidence in evidence-base medicine? Philos Sci 69(S3):S316–S330CrossRefGoogle Scholar
  96. Worrall J (2007) Why there’s no cause to randomize. Br J Philos Sci 58:453–454CrossRefGoogle Scholar

Copyright information

© Konrad Lorenz Institute for Evolution and Cognitive Research 2012

Authors and Affiliations

  1. 1.Department of Biological SciencesUniversity of Notre DameNotre DameUSA
  2. 2.Department of PhilosophyUniversity of Notre DameNotre DameUSA

Personalised recommendations