acta ethologica

, Volume 7, Issue 2, pp 103–108 | Cite as

The case against retrospective statistical power analyses with an introduction to power analysis

  • Shinichi NakagawaEmail author
  • T. Mary Foster


Statistical power analysis is an important tool for planning an experiment because this type of analysis allows researchers to identify an appropriate sample size for a particular experimental design. In recent years, it seems many biology journals (see Table 1 in Hoenig and Heisey 2001) have been encouraging researchers to calculate statistical power after their experiments when they have obtained non-significant results (hereafter, termed “retrospective power calculation or analysis” as opposed to “prospective power analysis”, which is conducted pre-experimentally).

For example, a leading journal in the field of animal behaviour, Animal Behaviour, asks for retrospective power calculations as a matter of editorial policy, stating “where a significance test based on a small sample size yields a non-significant result, explicit consideration should be given to the power of the data for accepting the null hypothesis” (issue xiii, revised November 2004). However,...


Statistical power analysis Retrospective power analysis Power approach paradox Effect size 



We gratefully acknowledge James McEwan, Richard Etheredge, Catherine Sumpter, Jens Rolff, and an anonymous referee for comments that have improved the manuscript. S. Nakagawa is supported by Foundation for Research Science and Foundation, New Zealand.


  1. Berger RL, Hsu JC (1996) Bioequivalence trials, intention-union tests and equivalence confidence sets. Stat Sci 11:283–319CrossRefGoogle Scholar
  2. Carver RP (1978) The case against statistical significance testing. Harv Educ Rev 48:378–399Google Scholar
  3. Chow SL (1988) Significance test or effect size? Psychol Bull 103:105–110CrossRefGoogle Scholar
  4. Cohen J (1962) The statistical power of abnormal social psychological research: a review. J Abnorm Soc Psychol 65:145–153PubMedGoogle Scholar
  5. Cohen J (1988) Statistical power analysis for the behavioral sciences, 2nd edn. Erlbaum, HillsdaleGoogle Scholar
  6. Cohen J (1990) Things I have learned (so far). Am Psychol 45:1304–1312CrossRefGoogle Scholar
  7. Cohen J (1992) Statistical power analysis. Curr Dir Psychol Sci 1:98–101CrossRefGoogle Scholar
  8. Cohen J (1994) The earth is round (P<0.05). Am Psychol 49:997–1003CrossRefGoogle Scholar
  9. Colegrave N, Ruxton GD (2003) Confidence intervals are a more useful complement to nonsignificant tests then are power calculations. Behav Ecol 14:446–450Google Scholar
  10. Dayton PK (1998) Reversal of the burden of proof in fisheries management. Science 279:821–822CrossRefGoogle Scholar
  11. Fairweather PG (1991) Statistical power and design requirements for environmental monitoring. Aust J Mar Freshwater Res 42:555–567Google Scholar
  12. Fisher RA (1935) The design of experiments. Hafner, New YorkGoogle Scholar
  13. Fleiss JL (1994) Measures of effect size for categorical data. In: Cooper H, Hedges LV (eds) The handbook of research synthesis. Sage, New York, pp 245–260Google Scholar
  14. Frick RW (1995) Accepting the null hypothesis. Mem Cognit 23:132–138PubMedGoogle Scholar
  15. Gerard PD, Smith DR, Weerkkody G (1998) Limits of retrospective power analysis. J Wildl Manage 62:801–807Google Scholar
  16. Glass GV (1977) Integrating findings: the meta-analysis of research. In: Shulman L (ed) Review of research in education, vol 5. Peacock, Itasca, pp 351–379Google Scholar
  17. Goodman SN, Berklin JA (1994) The use of predicted confidence intervals when planning experiments and the misuse of power when interpreting results. Ann Intern Med 121:200–206PubMedGoogle Scholar
  18. Hayes JP, Steidl RJ (1997) Statistical power analysis and amphibian population trends. Conser Biol 11:273–275CrossRefGoogle Scholar
  19. Hedges LV (1981) Distributional theory for Glass’s estimator of effect size and related estimators. J Educ Stat 6:107–128Google Scholar
  20. Hedges L, Olkin I (1985) Statistical methods for meta-analysis. Academic, New YorkGoogle Scholar
  21. Hoenig JM, Heisey DM (2001) The abuse of power: the pervasive fallacy of power calculations for data analysis. Am Stat 55:19–24CrossRefGoogle Scholar
  22. Hunter JE, Schmidt FL (1990) Methods of meta-analysis: correcting error and bias in research findings. Sage, Newbury ParkGoogle Scholar
  23. Jennions MD, Møller AP (2003) A survey of the statistical power of research in behavioral ecology and animal behavior. Behav Ecol 14:438–445Google Scholar
  24. Kirk RE (1996) Practical significance: a concept whose time has come. Educ Psychol Meas 56:746–759Google Scholar
  25. Lipsey MW, Wilson DB (2000) Practical meta-analysis. Sage, Beverly HillsGoogle Scholar
  26. Maddocks SA, Bennett ATD, Hunt S, Cuthill IC (2001) Context-dependent visual preferences in starlings and blue tits: mate choice and light environment. Anim Behav 63:69–75CrossRefGoogle Scholar
  27. Nakagawa S (2004) A farewell to Bonferroni: the problems of low statistical power and publication bias. Behav Ecol (doi: 10.1093/heheco/arh107) (in press)Google Scholar
  28. Nickerson RS (2000) Null hypothesis significance testing: a review of an old and continuing controversy. Psychol Methods 5:241–301CrossRefPubMedGoogle Scholar
  29. Parkhurst DF (2001) Statistical significance tests: equivalence and reverse tests should reduce misinterpretation. BioScience 51:1051–1057Google Scholar
  30. Perlman M, Wu L (1999) The emperor’s new tests. Stat Sci 14:355–369Google Scholar
  31. Rosenthal R (1993) Cumulating evidence. In: Keren G, Lewis C (eds) A handbook for data analysis in the behavioral sciences: methodological issues. Erlbaum, Hillsdale, pp 519–559Google Scholar
  32. Rosenthal R (1994) Parametric measures of effect size. In: Cooper H, Hedges LV (eds) The handbook of research synthesis. Sage, New York, pp 231–244Google Scholar
  33. Sedlmeier P, Gigerenzer G (1989) Do studies of statistical power have an effect on the power of studies. Psychol Bull 105:309–316CrossRefGoogle Scholar
  34. Steidl RJ, Thomas L (2001) Power analysis and experimental design. In: Scheiner SM, Gurevitch J (eds) Design and analysis of ecological experiments, 2 edn. Oxford University Press, Oxford, pp 14–36Google Scholar
  35. Still AW (1992) On the number of subjects used in animal behaviour experiments. Anim Behav 30:873–880Google Scholar
  36. Stoehr AM (1999) Are significance threshold appropriate for the study of animal behaviour? Anim Behav 57:F22–F25CrossRefPubMedGoogle Scholar
  37. Thomas L (1997) Retrospective power analysis. Conser Biol 11:276–280CrossRefGoogle Scholar
  38. Thomas RJ, Cuthill IC (2002) Body mass regulation and the daily singing routines of European robins. Anim Behav 63:285–292CrossRefGoogle Scholar
  39. Thompson B (2002) What future quantitative social science research could look like: confidence intervals for effect sizes. Educ Res 31:25–32Google Scholar
  40. Thompson CF, Neill AJ (1991) House wrens do not prefer clean nestboxes. Anim Behav 42:1022–1024Google Scholar

Copyright information

© Springer-Verlag and ISPA 2004

Authors and Affiliations

  1. 1.Department of Animal and Plant SciencesUniversity of SheffieldSheffieldUK
  2. 2.Department of PsychologyUniversity of WaikatoHamiltonNew Zealand

Personalised recommendations