Skip to main content

Advertisement

Log in

The interpretation of statistical power after the data have been gathered

  • Published:
Current Psychology Aims and scope Submit manuscript

Abstract

Post-hoc power estimates (power calculated for hypothesis tests after performing them) are sometimes requested by reviewers in an attempt to promote more rigorous designs. However, they should never be requested or reported because they have been shown to be logically invalid and practically misleading. We review the problems associated with post-hoc power, particularly the fact that the resulting calculated power is a monotone function of the p value and therefore contains no additional helpful information. We then discuss some situations that seem at first to call for post-hoc power analysis, such as attempts to decide on the practical implications of a null finding, or attempts to determine whether the sample size of a secondary data analysis is adequate for a proposed analysis, and consider possible approaches to achieving these goals. We make recommendations for practice in situations in which clear recommendations can be made, and point out other situations where further methodological research and discussion are required.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Amrhein, V., Korner-Nievergelt, F., & Roth, T. (2017). The earth is flat (p > 0.05): Significance thresholds and the crisis of unreplicable research. PeerJ, 5, e3544.

    Article  Google Scholar 

  • Anderson, S. F., Kelley, K., & Maxwell, S. E. (2017). Sample-size planning for more accurate statistical power: A method adjusting sample effect sizes for publication bias and uncertainty. Psychological Science, 28, 1547–1562.

    Article  Google Scholar 

  • Baril, G. L., & Cannon, J. T. (1995). What is the probability that null hypothesis testing is meaningless? American Psychologist, 50, 1098–1099.

    Article  Google Scholar 

  • Bierman, A. S., & Bubolz, T. (2003). Secondary analysis of large survey databases. In M. Mitchell & J. Lynn (Eds.), Symptom research: Methods and Opportunities (interactive textbook). Washington, DC: National Institutes of Health.

  • Button, K. S., Ioannidis, J. P. A., Mokrysz, C., Nosek, B. A., Flint, J., Robinson, E. S. J., & Munafò, M. R. (2013). Power failure: Why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience, 14, 365–376.

    Article  Google Scholar 

  • Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale: Erlbaum.

    Google Scholar 

  • Cohen, J. (1990). Things I have learned (so far). American Psychologist, 45, 1304–1312.

    Article  Google Scholar 

  • Cohen, J. (1992). A power primer. Psychological Bulletin, 112, 155–159.

    Article  Google Scholar 

  • Cohen, J. (1994). The earth is round (p < .05). American Psychologist, 49, 997–1003.

    Article  Google Scholar 

  • Demidenko, E. (2007). Sample size determination for logistic regression revisited. Statistics in Medicine, 26, 3385–3397.

    Article  Google Scholar 

  • Demidenko, E. (2016). The p-value you can't buy. The American Statistician, 70, 33–38.

    Article  Google Scholar 

  • Detsky, A. S., & Sackett, D. L. (1985). When was a 'negative' clinical trial big enough? How many patients you needed depends on what you found. Archives of Internal Medicine, 145(4), 709–712.

    Article  Google Scholar 

  • Durrleman, S., & Simon, R. (1990). Planning and monitoring of equivalence studies. Biometrics, 46, 329–336.

    Article  Google Scholar 

  • Dziak, J. J., Lanza, S. T., & Tan, X. (2014). Effect size, statistical power and sample size requirements for the bootstrap likelihood ratio test in latent class analysis. Structural Equation Modeling, 21, 534–552.

    Article  Google Scholar 

  • Eng, J. (2004). Sample size estimation: A glimpse beyond simple formulas. Radiology, 230, 606–612.

    Article  Google Scholar 

  • Esarey, J. (2017, August 7). Lowering the threshold of statistical significance to p < 0.005 to encourage enriched theories of politics. [blog post] Retrieved from https://thepoliticalmethodologist.com/.

  • Fagley, N. S. (1985). Applied statistical power analysis and the interpretation of nonsignificant results. Journal of Counseling Psychology, 32, 391–396.

    Article  Google Scholar 

  • Goldstein, A. (1964). Biostatistics: An introductory text. New York: MacMillan.

    Google Scholar 

  • Guo, Y., Logan, H. L., Glueck, D. H., & Muller, K. E. (2013). Selecting a sample size for studies with repeated measures. BMC Medical Research Methodology, 13, 100.

    Article  Google Scholar 

  • Hoenig, J. M., & Heisey, D. M. (2001). The abuse of power: The pervasive fallacy of power calculations in data analysis. The American Statistician, 55, 19–24.

    Article  Google Scholar 

  • Jones, L. V., & Tukey, J. W. (2000). A sensible formulation of the significance test. Psychological Methods, 5, 411–414.

    Article  Google Scholar 

  • Kirk, R. E. (2007). Effect magnitude: A different focus. Journal of Statistical Planning and Inference, 137, 1634–1646.

    Article  Google Scholar 

  • Korn, E. L. (1990). Projecting power from a previous study: Maximum likelihood estimation. The American Statistician, 22, 290–292.

    Google Scholar 

  • Kraemer, H. C., & Thiemann, S. (1987). How many subjects?: Statistical power analysis in research. Newbury Park: SAGE.

    Google Scholar 

  • Kruschke, J. K. (2015). Doing Bayesian data analysis, a tutorial with R, JAGS, and Stan (2nd ed.). Waltham: Academic Press / Elsevier.

    Google Scholar 

  • Lakens, D. (2017). Equivalence tests: A practical primer for t tests, correlations, and meta-analyses. Social Psychological and Personality Science, 8, 355–362.

    Article  Google Scholar 

  • Lenth, R. V. (2001). Some practical guidelines for effective sample-size determination (tech. Rep.). University of Iowa.

  • Lenth, R. V. (2007). Post-hoc power: Tables and commentary (tech. Rep.). University of Iowa: Department of Statistics and Actuarial Science.

  • Lindley, D. V. (1998). Decision analysis and bioequivalence trials. Statistical Science, 13, 136–141.

    Article  Google Scholar 

  • Lipsey, M. W., Crosse, S., Punkle, J., Pollard, J., & Stohart, G. (1985). Evaluation: The state of the art and the sorry state of the science. New Directions for Program Evaluation, 27, 7–28.

    Article  Google Scholar 

  • Longford, N. T. (2016). Comparing two treatments by decision theory. Pharmaceutical Statistics, 15, 387–395.

    Article  Google Scholar 

  • Munafò, M. R., Nosek, B. A., Bishop, D. V. M., Button, K. S., Chambers, C. D., du Sert, N. P., Simonsohn, U., Wagenmakers, E.-J., Ware, J. J., & Ioannidis, J. P. A. (2017). A manifesto for reproducible science. Nature Human Behaviour, 1, 0021.

    Article  Google Scholar 

  • Norcross, J. C., Hogan, T. P., Koocher, G. P., & Maggio, L. A. (2017). Clinician's guide to evidence-based practices: Behavioral health and addictions (2nd ed.). New York: Oxford.

    Book  Google Scholar 

  • Peterman, R. M. (1990). The importance of reporting statistical power: The forest decline and acidic deposition example. Ecology, 71, 2024–2027.

    Article  Google Scholar 

  • Schulz, K. F., & Grimes, D. A. (2005). Sample size calculations in randomized trials: Mandatory and mystical. Lancet, 365, 1348–1353.

    Article  Google Scholar 

  • Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22, 1359–1366.

    Article  Google Scholar 

  • Szucs, D., & Ioannidis, J. P. A. (2017a). When null hypothesis significance testing is unsuitable for research: A reassessment. Frontiers in Human Neuroscience, 11, 390.

    Article  Google Scholar 

  • Szucs, D., & Ioannidis, J. P. A. (2017b). Empirical assessment of published effect sizes and power in the recent cognitive neuroscience and psychology literature. PLoS Biology, 15(3), e2000797.

    Article  Google Scholar 

  • Thomas, L. (1997). Retrospective power analysis. Conservation Biology, 11, 276–280.

    Article  Google Scholar 

  • Vandenbroucke, J. P., von Elm, E., Altman, D. G., Mulrow, P. C. G. D., Pocock, S. J., Poole, C., et al. (2007). Strengthening the reporting of observational studies in epidemiology (STROBE): Explanation and elaboration. PLoS Medicine, 4(10), e297.

    Article  Google Scholar 

  • Vickers, A. J., & Altman, D. G. (2001). Analysing controlled trials with baseline and follow up measurements. BMJ, 323, 1123–1124.

    Article  Google Scholar 

  • Wasserstein, R. L., & Lazar, N. A. (2016). The ASA's statement on p-values: Context, process, and purpose. The American Statistician, 70(2), 129–133.

    Article  Google Scholar 

  • Yuan, K.-H., & Maxwell, S. (2005). On the post hoc power in testing mean differences. Journal of Educational and Behavioral Statistics, 30, 141–167.

    Article  Google Scholar 

Download references

Acknowledgements

This research was supported in part by grant awards P50 DA010075 and P50 DA039838 from the National Institute on Drug Abuse (National Institutes of Health, United States). The content is solely the responsibility of the authors and does not necessarily represent the official views of the funding institutions as mentioned above. The corresponding author thanks Dr. Joseph L. Schafer for a helpful conversation on this topic, and also thanks Dr. J. Timothy Cannon for review and feedback. The authors thank Amanda Applegate for very helpful editorial review, and also thank the anonymous reviewers for substantially improving the paper. On behalf of all authors, the corresponding author states that there are no conflicts of interest.

Funding

This research was supported in part by grant awards P50 DA010075 and P50 DA039838 from the National Institute on Drug Abuse (National Institutes of Health, United States). The content is solely the responsibility of the authors and does not necessarily represent the official views of the funding institutions as mentioned above.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to John Joseph Dziak.

Ethics declarations

Conflict of Interest

On behalf of all authors, the corresponding author states that there are no conflicts of interest.

Ethical Approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Informed Consent

Informed consent is not required on this study.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dziak, J.J., Dierker, L.C. & Abar, B. The interpretation of statistical power after the data have been gathered. Curr Psychol 39, 870–877 (2020). https://doi.org/10.1007/s12144-018-0018-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12144-018-0018-1

Keywords

Navigation