Tutorial: Small-N Power Analysis

Abstract

Power analysis is an overlooked and underreported aspect of study design. A priori power analysis involves estimating the sample size required for a study based on predetermined maximum tolerable Type I and II error rates and the minimum effect size that would be clinically, practically, or theoretically meaningful. Power is more often discussed within the context of large-N group designs, but power analyses can be used in small-N research and within-subjects designs to maximize the probative value of the research. In this tutorial, case studies illustrate how power analysis can be used by behavior analysts to compare two independent groups, behavior in baseline and intervention conditions, and response characteristics across multiple within-subject treatments. After reading this tutorial, the reader will be able to estimate just noticeable differences using means and standard deviations, convert them to standardized effect sizes, and use G*Power to determine the sample size needed to detect an effect with desired power.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Notes

  1. 1.

    For two groups with standard deviations SD1 and SD2, and sample sizes N1 and N2, the pooled standard deviation is \( \sqrt{\frac{\left({N}_1-1\right){SD_1}^2+\left({N}_2-1\right){SD_2}^2}{N_1+{N}_2-2}\left(\frac{1}{N_1}+\frac{1}{N_2}\right)} \), which simplifies to \( \sqrt{\frac{{SD_1}^2+{SD_2}^2}{2}} \) if sample sizes are equal.

References

  1. Association for Behavior Analysis International Accreditation Board. (2017). Accreditation handbook. Portage, MI: Author.

    Google Scholar 

  2. Behavior Analyst Certification Board. (2017). BCBA/BCaBA task list (5th ed.). Littleton, CO: Author.

    Google Scholar 

  3. Branch, M. (2014). Malignant side effects of null-hypothesis significance testing. Theory & Psychology, 24, 256–277. https://doi.org/10.1177/0959354314525282.

    Article  Google Scholar 

  4. Branch, M. N. (1999). Statistical inference in behavior analysis: some things significance testing does and does not do. Behavior Analyst, 22, 87–92.

    Article  PubMed  Google Scholar 

  5. Button, K. S., Ioannidis, J., Mokrysz, C., Nosek, B. A., Flint, J., Robinson, E. S. J., & Munafò, M. R. (2013). Power failure: why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience, 14, 365–376.

    Article  PubMed  Google Scholar 

  6. Cohen, J. (1962). The statistical power of abnormal—social psychological research: a review. Journal of Abnormal & Social Psychology, 65, 145–153.

    Article  Google Scholar 

  7. Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates.

    Google Scholar 

  8. Cohen, J. (1992a). Statistical power analysis. Current Directions in Psychological Science, 1, 98–101.

    Article  Google Scholar 

  9. Cohen, J. (1992b). A power primer. Psychological Bulletin, 112, 155–159.

    Article  PubMed  Google Scholar 

  10. Cohen, L. L., Feinstein, A., Masuda, A., & Vowles, K. E. (2014). Single-case research design in pediatric psychology: considerations regarding data analysis. Journal of Pediatric Psychology, 39, 124–137.

    Article  PubMed  Google Scholar 

  11. Davison, M. (1999). Statistical inference in behavior analysis: having my cake and eating it? Behavior Analyst, 22, 99–103.

    Article  PubMed  Google Scholar 

  12. Faul, F., Erdfelder, E., Buchner, A., & Lang, A.-G. (2009). Statistical power analyses using G*Power 3.1: tests for correlation and regression analyses. Behavior Research Methods, 41, 1149–1160.

    Article  PubMed  Google Scholar 

  13. Faul, F., Erdfelder, E., Lang, A.-G., & Buchner, A. (2007). G*Power 3: a flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39, 175–191.

    Article  PubMed  PubMed Central  Google Scholar 

  14. Fechner, G. T. (1912). Elements of psychophysics (H. S. Langfeld, Trans.). In B. Rand (Ed.), The classical psychologists (pp. 562–572). Retrieved from http://psychclassics.yorku.ca/Fechner/ (Original work published 1860).

  15. Fisher, W. W., & Lerman, D. C. (2014). It has been said that, “There are three degrees of falsehoods: lies, damn lies, and statistics.”. Journal of School Psychology, 52, 243–248.

    Article  PubMed  Google Scholar 

  16. Gigerenzer, G. (2004). Mindless statistics. Journal of Socio-Economics, 33, 587–606.

    Article  Google Scholar 

  17. Greenwald, A. G. (1976). Within-subjects designs: to use or not to use? Psychological Bulletin, 83(2), 314–320.

    Article  Google Scholar 

  18. Haig, B. D. (2017). Tests of statistical significance made sound. Educational & Psychological Measurement, 77, 489–506.

    Article  Google Scholar 

  19. Hayes, L. B., & Van Camp, C. M. (2015). Increasing physical activity of children during school recess. Journal of Applied Behavior Analysis, 48, 690–695.

    Article  PubMed  Google Scholar 

  20. Holt, D. D., Green, L., & Myerson, J. (2003). Is discounting impulsive? Evidence from temporal and probability discounting in gambling and non-gambling college students. Behavioural Processes, 64, 355–367.

    Article  PubMed  Google Scholar 

  21. Kyonka, E. G., Rice, N., & Ward, A. A. (2017). Categorical discrimination of sequential stimuli: all SΔ are not created equal. Psychological Record, 67, 27–41.

    Article  Google Scholar 

  22. Ladd, G. T., Molina, C. A., Kerins, G. J., & Petry, N. M. (2003). Gambling participation and problems among older adults. Journal of Geriatric Psychiatry & Neurology, 16, 172–177.

    Article  Google Scholar 

  23. Lane, D. (2016). The assumption of sphericity in repeated-measures designs: what it means and what to do when it is violated. Quantitative Methods for Psychology, 12, 114–122.

    Article  Google Scholar 

  24. Madden, G. J., Petry, N. M., & Johnson, P. S. (2009). Pathological gamblers discount probabilistic rewards less steeply than matched controls. Experimental & Clinical Psychopharmacology, 17, 283–290.

    Article  Google Scholar 

  25. Mauchly, J. W. (1940). Significance test for sphericity of a normal n-variate distribution. Annals of Mathematical Statistics, 11, 204–209.

    Article  Google Scholar 

  26. Mayo, D. G., & Spanos, A. (2006). Severe testing as a basic concept in a Neyman-Pearson philosophy of induction. British Journal for the Philosophy of Science, 57, 323–357.

    Article  Google Scholar 

  27. Mayo, D. G., & Spanos, A. (2011). Error statistics. In P. S. Bandyopadhyay & M. R. Forster (Eds.), Handbook of philosophy of science, Philosophy of statistics (Vol. 7, pp. 153–198). Amsterdam, Netherlands: Elsevier.

    Google Scholar 

  28. Michael, J. (1974). Statistical inference for individual organism research: mixed blessing or curse? Journal of Applied Behavior Analysis, 7, 647–653. https://doi.org/10.1901/jaba.1974.7-647.

    Article  PubMed  PubMed Central  Google Scholar 

  29. Mudge, J. F., Baker, L. F., Edge, C. B., & Houlahan, J. E. (2012). Setting an optimal α that minimizes errors in null hypothesis significance tests. PLoS ONE, 7(2), e32734. https://doi.org/10.1371/journal.pone.0032734.

    Article  PubMed  PubMed Central  Google Scholar 

  30. Neyman, J., & Pearson, E. S. (1928). On the use and interpretation of certain test criteria for purposes of statistical inference. Biometrika, 20A, 175–240 263–294.

    Google Scholar 

  31. Neyman, J., & Pearson, E. S. (1933). IX. On the problem of the most efficient tests of statistical hypotheses. Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character, 231(694–706), 289–337.

    Google Scholar 

  32. Pashler, H., & Harris, C. R. (2012). Is the replicability crisis overblown? Three arguments examined. Perspectives on Psychological Science, 7, 531–536.

    Article  PubMed  Google Scholar 

  33. Perone, M. (1999). Statistical inference in behavior analysis: experimental control is better. Behavior Analyst, 22, 109–116.

    Article  PubMed  Google Scholar 

  34. Peterson, C. (2009). Minimally sufficient research. Perspectives on Psychological Science, 4, 7–9.

    Article  PubMed  Google Scholar 

  35. Sidman, M. (1960). Tactics of scientific research: evaluating experimental data in psychology. New York, NY: Basic Books.

    Google Scholar 

  36. Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2013). Life after p-hacking. Meeting of the Society for Personality and Social Psychology, New Orleans, LA, January 17–19, 2013. Available at SSRN: http://ssrn.com/abstract=2205186 or doi:https://doi.org/10.2139/ssrn.2205186.

  37. Thompson, B. (2002). “Statistical,” “practical,” and “clinical”: how many kinds of significance do counselors need to consider? Journal of Counseling & Development, 80, 64–71. https://doi.org/10.1002/j.1556-6678.2002.tb00167.x.

    Article  Google Scholar 

  38. Thompson, V. A., & Campbell, J. I. (2004). A power struggle: between-vs. within-subjects designs in deductive reasoning research. Psychologia, 47, 277–296.

    Article  Google Scholar 

  39. Trafimow, D., & Marks, M. (2015). Publishing models and article dates explained. Basic & Applied Social Psychology, 37, 1.

    Article  Google Scholar 

  40. Wasserstein, R. L., & Lazar, N. A. (2016). The ASA’s statement on p-values: context, process, and purpose. American Statistician, 70, 129–133. https://doi.org/10.1080/00031305.2016.1154108.

    Article  Google Scholar 

  41. Weller, R. E., Cook, E. W., Avsar, K. B., & Cox, J. E. (2008). Obese women show greater delay discounting than healthy-weight women. Appetite, 51, 563–569.

    Article  PubMed  Google Scholar 

  42. Wilkinson, L., & The Task Force on Statistical Inference, American Psychological Association, Science Directorate. (1999). Statistical methods in psychology journals: guidelines and explanations. American Psychologist, 54, 594–604.

    Article  Google Scholar 

  43. Zimmermann, Z. J., Watkins, E. E., & Poling, A. (2015). JEAB research over time: species used, experimental designs, statistical analyses, and sex of subjects. Behavior Analyst, 38, 203–218.

    Article  PubMed  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Elizabeth G. E. Kyonka.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Kyonka, E.G.E. Tutorial: Small-N Power Analysis. Perspect Behav Sci 42, 133–152 (2019). https://doi.org/10.1007/s40614-018-0167-4

Download citation

Keywords

  • Experimental design
  • A priori power analysis
  • Effect size
  • Sample size
  • Tests of statistical significance
  • Hypothesis testing
  • G*Power