Advertisement

Perspectives on Behavior Science

, Volume 42, Issue 1, pp 133–152 | Cite as

Tutorial: Small-N Power Analysis

  • Elizabeth G. E. KyonkaEmail author
Article

Abstract

Power analysis is an overlooked and underreported aspect of study design. A priori power analysis involves estimating the sample size required for a study based on predetermined maximum tolerable Type I and II error rates and the minimum effect size that would be clinically, practically, or theoretically meaningful. Power is more often discussed within the context of large-N group designs, but power analyses can be used in small-N research and within-subjects designs to maximize the probative value of the research. In this tutorial, case studies illustrate how power analysis can be used by behavior analysts to compare two independent groups, behavior in baseline and intervention conditions, and response characteristics across multiple within-subject treatments. After reading this tutorial, the reader will be able to estimate just noticeable differences using means and standard deviations, convert them to standardized effect sizes, and use G*Power to determine the sample size needed to detect an effect with desired power.

Keywords

Experimental design A priori power analysis Effect size Sample size Tests of statistical significance Hypothesis testing G*Power 

References

  1. Association for Behavior Analysis International Accreditation Board. (2017). Accreditation handbook. Portage, MI: Author.Google Scholar
  2. Behavior Analyst Certification Board. (2017). BCBA/BCaBA task list (5th ed.). Littleton, CO: Author.Google Scholar
  3. Branch, M. (2014). Malignant side effects of null-hypothesis significance testing. Theory & Psychology, 24, 256–277.  https://doi.org/10.1177/0959354314525282.CrossRefGoogle Scholar
  4. Branch, M. N. (1999). Statistical inference in behavior analysis: some things significance testing does and does not do. Behavior Analyst, 22, 87–92.CrossRefGoogle Scholar
  5. Button, K. S., Ioannidis, J., Mokrysz, C., Nosek, B. A., Flint, J., Robinson, E. S. J., & Munafò, M. R. (2013). Power failure: why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience, 14, 365–376.CrossRefGoogle Scholar
  6. Cohen, J. (1962). The statistical power of abnormal—social psychological research: a review. Journal of Abnormal & Social Psychology, 65, 145–153.CrossRefGoogle Scholar
  7. Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates.Google Scholar
  8. Cohen, J. (1992a). Statistical power analysis. Current Directions in Psychological Science, 1, 98–101.CrossRefGoogle Scholar
  9. Cohen, J. (1992b). A power primer. Psychological Bulletin, 112, 155–159.CrossRefGoogle Scholar
  10. Cohen, L. L., Feinstein, A., Masuda, A., & Vowles, K. E. (2014). Single-case research design in pediatric psychology: considerations regarding data analysis. Journal of Pediatric Psychology, 39, 124–137.CrossRefGoogle Scholar
  11. Davison, M. (1999). Statistical inference in behavior analysis: having my cake and eating it? Behavior Analyst, 22, 99–103.CrossRefGoogle Scholar
  12. Faul, F., Erdfelder, E., Buchner, A., & Lang, A.-G. (2009). Statistical power analyses using G*Power 3.1: tests for correlation and regression analyses. Behavior Research Methods, 41, 1149–1160.CrossRefGoogle Scholar
  13. Faul, F., Erdfelder, E., Lang, A.-G., & Buchner, A. (2007). G*Power 3: a flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39, 175–191.CrossRefGoogle Scholar
  14. Fechner, G. T. (1912). Elements of psychophysics (H. S. Langfeld, Trans.). In B. Rand (Ed.), The classical psychologists (pp. 562–572). Retrieved from http://psychclassics.yorku.ca/Fechner/ (Original work published 1860).
  15. Fisher, W. W., & Lerman, D. C. (2014). It has been said that, “There are three degrees of falsehoods: lies, damn lies, and statistics.”. Journal of School Psychology, 52, 243–248.CrossRefGoogle Scholar
  16. Gigerenzer, G. (2004). Mindless statistics. Journal of Socio-Economics, 33, 587–606.CrossRefGoogle Scholar
  17. Greenwald, A. G. (1976). Within-subjects designs: to use or not to use? Psychological Bulletin, 83(2), 314–320.CrossRefGoogle Scholar
  18. Haig, B. D. (2017). Tests of statistical significance made sound. Educational & Psychological Measurement, 77, 489–506.CrossRefGoogle Scholar
  19. Hayes, L. B., & Van Camp, C. M. (2015). Increasing physical activity of children during school recess. Journal of Applied Behavior Analysis, 48, 690–695.CrossRefGoogle Scholar
  20. Holt, D. D., Green, L., & Myerson, J. (2003). Is discounting impulsive? Evidence from temporal and probability discounting in gambling and non-gambling college students. Behavioural Processes, 64, 355–367.CrossRefGoogle Scholar
  21. Kyonka, E. G., Rice, N., & Ward, A. A. (2017). Categorical discrimination of sequential stimuli: all SΔ are not created equal. Psychological Record, 67, 27–41.CrossRefGoogle Scholar
  22. Ladd, G. T., Molina, C. A., Kerins, G. J., & Petry, N. M. (2003). Gambling participation and problems among older adults. Journal of Geriatric Psychiatry & Neurology, 16, 172–177.CrossRefGoogle Scholar
  23. Lane, D. (2016). The assumption of sphericity in repeated-measures designs: what it means and what to do when it is violated. Quantitative Methods for Psychology, 12, 114–122.CrossRefGoogle Scholar
  24. Madden, G. J., Petry, N. M., & Johnson, P. S. (2009). Pathological gamblers discount probabilistic rewards less steeply than matched controls. Experimental & Clinical Psychopharmacology, 17, 283–290.CrossRefGoogle Scholar
  25. Mauchly, J. W. (1940). Significance test for sphericity of a normal n-variate distribution. Annals of Mathematical Statistics, 11, 204–209.CrossRefGoogle Scholar
  26. Mayo, D. G., & Spanos, A. (2006). Severe testing as a basic concept in a Neyman-Pearson philosophy of induction. British Journal for the Philosophy of Science, 57, 323–357.CrossRefGoogle Scholar
  27. Mayo, D. G., & Spanos, A. (2011). Error statistics. In P. S. Bandyopadhyay & M. R. Forster (Eds.), Handbook of philosophy of science, Philosophy of statistics (Vol. 7, pp. 153–198). Amsterdam, Netherlands: Elsevier.Google Scholar
  28. Michael, J. (1974). Statistical inference for individual organism research: mixed blessing or curse? Journal of Applied Behavior Analysis, 7, 647–653.  https://doi.org/10.1901/jaba.1974.7-647.CrossRefGoogle Scholar
  29. Mudge, J. F., Baker, L. F., Edge, C. B., & Houlahan, J. E. (2012). Setting an optimal α that minimizes errors in null hypothesis significance tests. PLoS ONE, 7(2), e32734.  https://doi.org/10.1371/journal.pone.0032734.CrossRefGoogle Scholar
  30. Neyman, J., & Pearson, E. S. (1928). On the use and interpretation of certain test criteria for purposes of statistical inference. Biometrika, 20A, 175–240 263–294.Google Scholar
  31. Neyman, J., & Pearson, E. S. (1933). IX. On the problem of the most efficient tests of statistical hypotheses. Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character, 231(694–706), 289–337.Google Scholar
  32. Pashler, H., & Harris, C. R. (2012). Is the replicability crisis overblown? Three arguments examined. Perspectives on Psychological Science, 7, 531–536.CrossRefGoogle Scholar
  33. Perone, M. (1999). Statistical inference in behavior analysis: experimental control is better. Behavior Analyst, 22, 109–116.CrossRefGoogle Scholar
  34. Peterson, C. (2009). Minimally sufficient research. Perspectives on Psychological Science, 4, 7–9.CrossRefGoogle Scholar
  35. Sidman, M. (1960). Tactics of scientific research: evaluating experimental data in psychology. New York, NY: Basic Books.Google Scholar
  36. Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2013). Life after p-hacking. Meeting of the Society for Personality and Social Psychology, New Orleans, LA, January 17–19, 2013. Available at SSRN: http://ssrn.com/abstract=2205186 or doi: https://doi.org/10.2139/ssrn.2205186.
  37. Thompson, B. (2002). “Statistical,” “practical,” and “clinical”: how many kinds of significance do counselors need to consider? Journal of Counseling & Development, 80, 64–71.  https://doi.org/10.1002/j.1556-6678.2002.tb00167.x.CrossRefGoogle Scholar
  38. Thompson, V. A., & Campbell, J. I. (2004). A power struggle: between-vs. within-subjects designs in deductive reasoning research. Psychologia, 47, 277–296.CrossRefGoogle Scholar
  39. Trafimow, D., & Marks, M. (2015). Publishing models and article dates explained. Basic & Applied Social Psychology, 37, 1.CrossRefGoogle Scholar
  40. Wasserstein, R. L., & Lazar, N. A. (2016). The ASA’s statement on p-values: context, process, and purpose. American Statistician, 70, 129–133.  https://doi.org/10.1080/00031305.2016.1154108.CrossRefGoogle Scholar
  41. Weller, R. E., Cook, E. W., Avsar, K. B., & Cox, J. E. (2008). Obese women show greater delay discounting than healthy-weight women. Appetite, 51, 563–569.CrossRefGoogle Scholar
  42. Wilkinson, L., & The Task Force on Statistical Inference, American Psychological Association, Science Directorate. (1999). Statistical methods in psychology journals: guidelines and explanations. American Psychologist, 54, 594–604.CrossRefGoogle Scholar
  43. Zimmermann, Z. J., Watkins, E. E., & Poling, A. (2015). JEAB research over time: species used, experimental designs, statistical analyses, and sex of subjects. Behavior Analyst, 38, 203–218.CrossRefGoogle Scholar

Copyright information

© Association for Behavior Analysis International 2018

Authors and Affiliations

  1. 1.PsychologyUniversity of New EnglandArmidaleAustralia

Personalised recommendations