# Tutorial: Small-N Power Analysis

## Abstract

Power analysis is an overlooked and underreported aspect of study design. A priori power analysis involves estimating the sample size required for a study based on predetermined maximum tolerable Type I and II error rates and the minimum effect size that would be clinically, practically, or theoretically meaningful. Power is more often discussed within the context of large-N group designs, but power analyses can be used in small-N research and within-subjects designs to maximize the probative value of the research. In this tutorial, case studies illustrate how power analysis can be used by behavior analysts to compare two independent groups, behavior in baseline and intervention conditions, and response characteristics across multiple within-subject treatments. After reading this tutorial, the reader will be able to estimate just noticeable differences using means and standard deviations, convert them to standardized effect sizes, and use G*Power to determine the sample size needed to detect an effect with desired power.

This is a preview of subscription content, log in to check access.

## Notes

1. 1.

For two groups with standard deviations SD1 and SD2, and sample sizes N1 and N2, the pooled standard deviation is $$\sqrt{\frac{\left({N}_1-1\right){SD_1}^2+\left({N}_2-1\right){SD_2}^2}{N_1+{N}_2-2}\left(\frac{1}{N_1}+\frac{1}{N_2}\right)}$$, which simplifies to $$\sqrt{\frac{{SD_1}^2+{SD_2}^2}{2}}$$ if sample sizes are equal.

## References

1. Association for Behavior Analysis International Accreditation Board. (2017). Accreditation handbook. Portage, MI: Author.

2. Behavior Analyst Certification Board. (2017). BCBA/BCaBA task list (5th ed.). Littleton, CO: Author.

3. Branch, M. (2014). Malignant side effects of null-hypothesis significance testing. Theory & Psychology, 24, 256–277. https://doi.org/10.1177/0959354314525282.

4. Branch, M. N. (1999). Statistical inference in behavior analysis: some things significance testing does and does not do. Behavior Analyst, 22, 87–92.

5. Button, K. S., Ioannidis, J., Mokrysz, C., Nosek, B. A., Flint, J., Robinson, E. S. J., & Munafò, M. R. (2013). Power failure: why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience, 14, 365–376.

6. Cohen, J. (1962). The statistical power of abnormal—social psychological research: a review. Journal of Abnormal & Social Psychology, 65, 145–153.

7. Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates.

8. Cohen, J. (1992a). Statistical power analysis. Current Directions in Psychological Science, 1, 98–101.

9. Cohen, J. (1992b). A power primer. Psychological Bulletin, 112, 155–159.

10. Cohen, L. L., Feinstein, A., Masuda, A., & Vowles, K. E. (2014). Single-case research design in pediatric psychology: considerations regarding data analysis. Journal of Pediatric Psychology, 39, 124–137.

11. Davison, M. (1999). Statistical inference in behavior analysis: having my cake and eating it? Behavior Analyst, 22, 99–103.

12. Faul, F., Erdfelder, E., Buchner, A., & Lang, A.-G. (2009). Statistical power analyses using G*Power 3.1: tests for correlation and regression analyses. Behavior Research Methods, 41, 1149–1160.

13. Faul, F., Erdfelder, E., Lang, A.-G., & Buchner, A. (2007). G*Power 3: a flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39, 175–191.

14. Fechner, G. T. (1912). Elements of psychophysics (H. S. Langfeld, Trans.). In B. Rand (Ed.), The classical psychologists (pp. 562–572). Retrieved from http://psychclassics.yorku.ca/Fechner/ (Original work published 1860).

15. Fisher, W. W., & Lerman, D. C. (2014). It has been said that, “There are three degrees of falsehoods: lies, damn lies, and statistics.”. Journal of School Psychology, 52, 243–248.

16. Gigerenzer, G. (2004). Mindless statistics. Journal of Socio-Economics, 33, 587–606.

17. Greenwald, A. G. (1976). Within-subjects designs: to use or not to use? Psychological Bulletin, 83(2), 314–320.

18. Haig, B. D. (2017). Tests of statistical significance made sound. Educational & Psychological Measurement, 77, 489–506.

19. Hayes, L. B., & Van Camp, C. M. (2015). Increasing physical activity of children during school recess. Journal of Applied Behavior Analysis, 48, 690–695.

20. Holt, D. D., Green, L., & Myerson, J. (2003). Is discounting impulsive? Evidence from temporal and probability discounting in gambling and non-gambling college students. Behavioural Processes, 64, 355–367.

21. Kyonka, E. G., Rice, N., & Ward, A. A. (2017). Categorical discrimination of sequential stimuli: all SΔ are not created equal. Psychological Record, 67, 27–41.

22. Ladd, G. T., Molina, C. A., Kerins, G. J., & Petry, N. M. (2003). Gambling participation and problems among older adults. Journal of Geriatric Psychiatry & Neurology, 16, 172–177.

23. Lane, D. (2016). The assumption of sphericity in repeated-measures designs: what it means and what to do when it is violated. Quantitative Methods for Psychology, 12, 114–122.

24. Madden, G. J., Petry, N. M., & Johnson, P. S. (2009). Pathological gamblers discount probabilistic rewards less steeply than matched controls. Experimental & Clinical Psychopharmacology, 17, 283–290.

25. Mauchly, J. W. (1940). Significance test for sphericity of a normal n-variate distribution. Annals of Mathematical Statistics, 11, 204–209.

26. Mayo, D. G., & Spanos, A. (2006). Severe testing as a basic concept in a Neyman-Pearson philosophy of induction. British Journal for the Philosophy of Science, 57, 323–357.

27. Mayo, D. G., & Spanos, A. (2011). Error statistics. In P. S. Bandyopadhyay & M. R. Forster (Eds.), Handbook of philosophy of science, Philosophy of statistics (Vol. 7, pp. 153–198). Amsterdam, Netherlands: Elsevier.

28. Michael, J. (1974). Statistical inference for individual organism research: mixed blessing or curse? Journal of Applied Behavior Analysis, 7, 647–653. https://doi.org/10.1901/jaba.1974.7-647.

29. Mudge, J. F., Baker, L. F., Edge, C. B., & Houlahan, J. E. (2012). Setting an optimal α that minimizes errors in null hypothesis significance tests. PLoS ONE, 7(2), e32734. https://doi.org/10.1371/journal.pone.0032734.

30. Neyman, J., & Pearson, E. S. (1928). On the use and interpretation of certain test criteria for purposes of statistical inference. Biometrika, 20A, 175–240 263–294.

31. Neyman, J., & Pearson, E. S. (1933). IX. On the problem of the most efficient tests of statistical hypotheses. Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character, 231(694–706), 289–337.

32. Pashler, H., & Harris, C. R. (2012). Is the replicability crisis overblown? Three arguments examined. Perspectives on Psychological Science, 7, 531–536.

33. Perone, M. (1999). Statistical inference in behavior analysis: experimental control is better. Behavior Analyst, 22, 109–116.

34. Peterson, C. (2009). Minimally sufficient research. Perspectives on Psychological Science, 4, 7–9.

35. Sidman, M. (1960). Tactics of scientific research: evaluating experimental data in psychology. New York, NY: Basic Books.

36. Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2013). Life after p-hacking. Meeting of the Society for Personality and Social Psychology, New Orleans, LA, January 17–19, 2013. Available at SSRN: http://ssrn.com/abstract=2205186 or doi:https://doi.org/10.2139/ssrn.2205186.

37. Thompson, B. (2002). “Statistical,” “practical,” and “clinical”: how many kinds of significance do counselors need to consider? Journal of Counseling & Development, 80, 64–71. https://doi.org/10.1002/j.1556-6678.2002.tb00167.x.

38. Thompson, V. A., & Campbell, J. I. (2004). A power struggle: between-vs. within-subjects designs in deductive reasoning research. Psychologia, 47, 277–296.

39. Trafimow, D., & Marks, M. (2015). Publishing models and article dates explained. Basic & Applied Social Psychology, 37, 1.

40. Wasserstein, R. L., & Lazar, N. A. (2016). The ASA’s statement on p-values: context, process, and purpose. American Statistician, 70, 129–133. https://doi.org/10.1080/00031305.2016.1154108.

41. Weller, R. E., Cook, E. W., Avsar, K. B., & Cox, J. E. (2008). Obese women show greater delay discounting than healthy-weight women. Appetite, 51, 563–569.

42. Wilkinson, L., & The Task Force on Statistical Inference, American Psychological Association, Science Directorate. (1999). Statistical methods in psychology journals: guidelines and explanations. American Psychologist, 54, 594–604.

43. Zimmermann, Z. J., Watkins, E. E., & Poling, A. (2015). JEAB research over time: species used, experimental designs, statistical analyses, and sex of subjects. Behavior Analyst, 38, 203–218.

## Author information

Authors

### Corresponding author

Correspondence to Elizabeth G. E. Kyonka.

## Rights and permissions

Reprints and Permissions

Kyonka, E.G.E. Tutorial: Small-N Power Analysis. Perspect Behav Sci 42, 133–152 (2019). https://doi.org/10.1007/s40614-018-0167-4

• Published:

• Issue Date:

### Keywords

• Experimental design
• A priori power analysis
• Effect size
• Sample size
• Tests of statistical significance
• Hypothesis testing
• G*Power