Abstract
The recent commentary by Beaujean and Farmer (2020) on the original paper by Dixon et al. (2019) serves a cautionary tale of selective p-values, the law of small N sizes, and the type-II error. We believe these authors have crafted a somewhat questionable argument in which only 57% of the original Dixon et al. data were re-analyzed, based on a series of assumptions that removed significant power from the original statistical analysis. We here provide an additional 12 re-analyses of the original data and demonstrated that depending on the testing assumptions and data inclusion criteria, p-value may or may not exceed the commonly used .05 level. Although we support healthy discussion on experimental methods, and appreciate the reinterpretation of our findings, the reader is cautioned that because Beaujean and Farmer’s conclusions are limited to such a restricted range of data and different pre-analytic assumptions, the actual importance of their obtained p-value must remain in the eye of the beholder.
This is a preview of subscription content, access via your institution.

Notes
We want to further point out that many existed studies have demonstrated the robustness of common statistical models with a sample that is not normally distributed, such as t test (Srivastava, 1958), ANOVA (Blanca et al., 2017), and even robustness against Type-I error in ANCOVA under certain circumstances (i.e., the homogeneity of slopes, which the original Dixon et al. data met; Levy, 1980). However, we recognize the preference between accepting the robustness and using alternative statistical models, which is presented in the next section.
References
Ackley, M., Subramanian, J. W., Moore, J. W., Litten, S., Lundy, M. P., & Bishop, S. K. (2019). A review of language development protocols for individuals with autism. Journal of Behavioral Education, 28(3), 362–388.
Ahad, N. A., Yin, T. S., Othman, A. R., & Yaacob, C. R. (2011). Sensitivity of normality tests to non-normal data. Sains Malaysiana, 40(6), 637–641.
Amrhein, V., KornerNievergelt, F., & Roth, T. (2017). The earth is flat (p > 0.05): Significance thresholds and the crisis of unreplicable research. PeerJ, 5, e3544. https://doi.org/10.7717/peerj.3544.
Beaujean, A. A., & Farmer, R. L. (2020). Conceptual and Methodological concerns a commentary on “Randomized controlled trial evaluation of ABA content on IQ gains in children with autism.” Journal of Behavioral Education. https://doi.org/10.1007/s10864-020-09408-z.
Blanca, M. J., Alarcón, R., Arnau, J., Bono, R., & Bendayan, R. (2017). Non-normal data: Is ANOVA still a valid option? Psicothema, 29(4), 552–557.
Carver, R. (1978). The case against statistical significance testing. Harvard Educational Review, 48(3), 378–399.
Carver, R. P. (1993). The case against statistical significance testing, revisited. The Journal of Experimental Education, 61(4), 287–292.
Cassidy, S., Roche, B., & Hayes, S. C. (2011). A relational frame training intervention to raise intelligence quotients: A pilot study. The Psychological Record, 61(2), 173–198.
Cassidy, S., Roche, B., & O’Hora, D. (2010). Relational frame theory and human intelligence. European Journal of Behavior Analysis, 11(1), 37–51.
Collins, L. M., & Sayer, A. G. (2001). New methods for the analysis of change. American Psychological Association. https://doi.org/10.1037/10409-000
Cohen, J. (1994). The earth is round (p < 05). American psychologist, 49(12), 997.
Cohen, N. J., Davine, M., & Meloche-Kelly, M. (1989). Prevalence of unsuspected language disorders in a child psychiatric population. Journal of the American Academy of Child & Adolescent Psychiatry, 28(1), 107–111.
Colbert, D., Tyndall, I., Roche, B., & Cassidy, S. (2018). Can SMART training really increase intelligence? A replication study. Journal of Behavioral Education, 27(4), 509–531.
Curran, P. J., & Hussong, A. M. (2003). The use of latent trajectory models in psychopathology research. Journal of abnormal psychology, 112(4), 526.
Dixon, M. R. (2014a). PEAK relational training system: Direct training module. Shawnee Scientific Press.
Dixon, M. R. (2014b). PEAK relational training system: Generalization module. Shawnee Scientific Press.
Dixon, M. R. (2015). PEAK relaitonal training system: Equivalence module. Shawnee Scientific Press.
Dixon, M. R. (2016). PEAK relational training system: Transformation module. Shawnee Scientific Press.
Dixon, M. R., Paliliunas, D., Barron, B. F., Schmick, A. M., & Stanley, C. R. (2019). Randomized controlled trial evaluation of ABA content on IQ gains in children with autism. Journal of Behavioral Education, 1–23.
Dixon, M. R., Whiting, S. W., Rowsey, K., & Belisly, J. (2014). Assessing the relationship between intelligence and the PEAK relational training system. Research in Autism Spectrum Disorders, 8(9), 1208–1213.
Einstein, A. (1905). On the electrodynamics of moving bodies. Annalen der physik, 17(10), 891–921.
Fisher, R. A. (1925). Statistical methods for research workers. Oliver and Boyd.
Fisher, R. A. (1926). The arrangement of field experiments. Journal of the Ministry of Agriculture of Great Britain, 33, 503–515.
Flanagan, D. P., & Alfonso, V. C. (2017). Essentials of WISC-V assessment. . John Wiley & Sons.
Francis, D. J., Fletcher, J. M., Stuebing, K. K., Davidson, K. C., & Thompson, N. M. (1992). Analysis of change: Modeling individual growth. Journal of consulting and clinical psychology, 59(1), 27–37.
Greenland, S., Senn, S. J., Rothman, K. J., Carlin, J. B., Poole, C., Goodman, S. N., & Altman, D. G. (2016). Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. European journal of epidemiology, 31(4), 337–350.
Hayes, S. C., Barnes-Holmes, D., & Roche, B. (2001). Relational frame theory: A post-Skinnerian account of human language and cognition. Springer Science & Business Media.
Hessl, D., Nguyen, D. V., Green, C., Chavez, A., Tassone, F., Hagerman, R. J., Senturk, D., Schneider, A., Lightbody, A., & Reiss, A. L. (2009). A solution to limitations of cognitive testing in children with intellectual disabilities: The case of fragile X syndrome. Journal of Neurodevelopmental Disorders, 1(1), 33–45.
Howell, D. C. (2013a). Analysis of variance and covariance as general linear models. In Statistical Methods for Psychology 8th Edition (pp. 573–622). Cengage Learning.
Howell, D. C. (2013b). The normal distribution. In Statistical Methods for Psychology 8th Edition (pp. 63–82). Cengage Learning.
Huck, S. W. (2011). The Analysis of Covariance. In S. W. Huck (Ed.), Reading Statistics and Research Sixth Edition (pp. 343–366). Pearson.
Kaufman, A. S. (2018). Contemporary intellectual assessment: Theories, tests, and issues. Guilford Publications.
Koegel, L. K., Koegel, R. L., & Smith, A. (1997). Variables related to differences in standardized test outcomes for children with autism. Journal of Autism and Developmental Disorders, 27(3), 233–243.
Le Boedec, K. (2016). Sensitivity and specificity of normality tests and consequences on reference interval accuracy at small sample size: A computer-simulation study. Veterinary clinical pathology, 45(4), 648–656.
Levy, K. J. (1980). A Monte Carlo study of analysis of covariance under violations of the assumptions of normality and equal regression slopes. Educational and Psychological Measurement, 40(4), 835–840.
McKenzie, K., & Murray, A. L. (2015). Evaluating the use of the Child and Adolescent Intellectual Disability Screening Questionnaire (CAIDS-Q) to estimate IQ in children with low intellectual ability. Research in developmental disabilities, 37, 31–36.
Michelson, A. A., & Morley, E. W. (1887). On the relative motion of the earth and of the luminiferous ether. Sidereal Messenger, 6, 306–310.
Minton, J., Campbell, M., Green, W. H., Jennings, S., & Samit, C. (1982). Cognitive assessment of siblings of autistic children. Journal of the American Academy of Child Psychiatry, 21(3), 256–261.
Moore, J. L., Yi, Z., Hinman, J. M., Barron, B. F., & Dixon, M. R. (2020). Examining the convergent validity between the PEAK Relational Training System’s semi-standardized and standardized skill assessments. Journal of Developmental and Physical Disabilities. https://doi.org/10.1007/s10882-020-09771-9.
Murdoch, D. J., Tsai, Y.-L., & Adcock, J. (2008). P-values are random variables. The American Statistician, 62(3), 242–245.
Proust-Lima, C., Séne, M., Taylor, J. M., & Jacqmin-Gadda, H. (2014). Joint latent class models for longitudinal and time-to-event data: A review. Statistical methods in medical research, 23(1), 74–90.
Rosales, R., Rehfeldt, R. A., & Lovett, S. (2011). Effects of multiple exemplar training on the emergence of derived relations in preschool children learning a second language. The Analysis of verbal behavior, 27(1), 61–74.
Sansone, S. M., Schneider, A., Bickel, E., Berry-Kravis, E., Prescott, C., & Hessl, D. (2014). Improving IQ measurement in intellectual disabilities using true deviation from population norms. Journal of Neurodevelopmental Disorders, 6(1), 16.
Schreiber, J. B. (2020). New paradigms for considering statistical significance: A way forward for health services research journals, their authors, and their readership. Research in Social and Administrative Pharmacy, 16(4), 591–594.
Skinner, B. F. (1957). Verbal Behavior. . Appleton-Century-Crofts.
Srivastava, A. B. L. (1958). Effect of non-normality on the power function of t-test. Biometrika, 45(3/4), 421–430.
Sterling, T. D. (1959). Publication decisions and their possible effects on inferences drawn from tests of significance—or vice versa. Journal of the American statistical association, 54(285), 30–34.
Stolley, P. D. (1991). When genius errs: RA Fisher and the lung cancer controversy. American Journal of Epidemiology, 133(5), 416–425.
Thatcher, R., McAlaster, R., Lester, M., & Cantor, D. (1984). Comparisons among EEG, hair minerals and diet predictions of reading performance in children. . Annals of the New York Academy of Sciences.
Toffalini, E., Buono, S., Zagaria, T., Calcagnì, A., & Cornoldi, C. (2019). Using Z and age-equivalent scores to address WISC-IV floor effects for children with intellectual disability. Journal of Intellectual Disability Research, 63(6), 528–538.
Von Rosen, D. (1991). The growth curve model: A review. Communications in Statistics-Theory and Methods, 20(9), 2791–2822.
Wechsler, D. (2012). Wechsler Preschool and Primary Scale of Intelligence Fourth Edition. The Psychological Corporation.
Wechsler, D. (2014). Wechsler Intelligence Scale for Children Fifth Edition. The Psychological Corporation.
Yazici, B., & Yolacan, S. (2007). A comparison of various tests of normality. Journal of Statistical Computation and Simulation, 77(2), 175–183.
Ziliak, S. T. (2008). Retrospectives: Guinnessometrics: The Economic Foundation of “Student’s” t. Journal of Economic Perspectives, 22(4), 199–216.
Ziliak, S., & McCloskey, D. N. (2008). The cult of statistical significance: How the standard error costs us jobs, justice, and lives. . University of Michigan Press.
Funding
No funds, grants, or other support was received.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interests
Mark R. Dixon receives small royalties from the sales of the PEAK.
Ethical approval
The original study was approved by the Human Subjects Committee at the Southern Illinois University Carbondale (Protocol: 15334).
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Yi, Z., Schreiber, J.B., Paliliunas, D. et al. P < .05 is in the Eye of the Beholder: A Response to Beaujean and Farmer (2020). J Behav Educ 30, 489–511 (2021). https://doi.org/10.1007/s10864-021-09435-4
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10864-021-09435-4
Keywords
- Relational frame theory
- PEAK
- Intelligence
- Autism