Skip to main content

P < .05 is in the Eye of the Beholder: A Response to Beaujean and Farmer (2020)

Abstract

The recent commentary by Beaujean and Farmer (2020) on the original paper by Dixon et al. (2019) serves a cautionary tale of selective p-values, the law of small N sizes, and the type-II error. We believe these authors have crafted a somewhat questionable argument in which only 57% of the original Dixon et al. data were re-analyzed, based on a series of assumptions that removed significant power from the original statistical analysis. We here provide an additional 12 re-analyses of the original data and demonstrated that depending on the testing assumptions and data inclusion criteria, p-value may or may not exceed the commonly used .05 level. Although we support healthy discussion on experimental methods, and appreciate the reinterpretation of our findings, the reader is cautioned that because Beaujean and Farmer’s conclusions are limited to such a restricted range of data and different pre-analytic assumptions, the actual importance of their obtained p-value must remain in the eye of the beholder.

This is a preview of subscription content, access via your institution.

Fig. 1

Notes

  1. We want to further point out that many existed studies have demonstrated the robustness of common statistical models with a sample that is not normally distributed, such as t test (Srivastava, 1958), ANOVA (Blanca et al., 2017), and even robustness against Type-I error in ANCOVA under certain circumstances (i.e., the homogeneity of slopes, which the original Dixon et al. data met; Levy, 1980). However, we recognize the preference between accepting the robustness and using alternative statistical models, which is presented in the next section.

References

  • Ackley, M., Subramanian, J. W., Moore, J. W., Litten, S., Lundy, M. P., & Bishop, S. K. (2019). A review of language development protocols for individuals with autism. Journal of Behavioral Education, 28(3), 362–388.

    Article  Google Scholar 

  • Ahad, N. A., Yin, T. S., Othman, A. R., & Yaacob, C. R. (2011). Sensitivity of normality tests to non-normal data. Sains Malaysiana, 40(6), 637–641.

    Google Scholar 

  • Amrhein, V., KornerNievergelt, F., & Roth, T. (2017). The earth is flat (p > 0.05): Significance thresholds and the crisis of unreplicable research. PeerJ, 5, e3544. https://doi.org/10.7717/peerj.3544.

    Article  PubMed  PubMed Central  Google Scholar 

  • Beaujean, A. A., & Farmer, R. L. (2020). Conceptual and Methodological concerns a commentary on “Randomized controlled trial evaluation of ABA content on IQ gains in children with autism.” Journal of Behavioral Education. https://doi.org/10.1007/s10864-020-09408-z.

    Article  Google Scholar 

  • Blanca, M. J., Alarcón, R., Arnau, J., Bono, R., & Bendayan, R. (2017). Non-normal data: Is ANOVA still a valid option? Psicothema, 29(4), 552–557.

    PubMed  Google Scholar 

  • Carver, R. (1978). The case against statistical significance testing. Harvard Educational Review, 48(3), 378–399.

    Article  Google Scholar 

  • Carver, R. P. (1993). The case against statistical significance testing, revisited. The Journal of Experimental Education, 61(4), 287–292.

    Article  Google Scholar 

  • Cassidy, S., Roche, B., & Hayes, S. C. (2011). A relational frame training intervention to raise intelligence quotients: A pilot study. The Psychological Record, 61(2), 173–198.

    Article  Google Scholar 

  • Cassidy, S., Roche, B., & O’Hora, D. (2010). Relational frame theory and human intelligence. European Journal of Behavior Analysis, 11(1), 37–51.

    Article  Google Scholar 

  • Collins, L. M., & Sayer, A. G. (2001). New methods for the analysis of change. American Psychological Association. https://doi.org/10.1037/10409-000

  • Cohen, J. (1994). The earth is round (p < 05). American psychologist, 49(12), 997.

    Article  Google Scholar 

  • Cohen, N. J., Davine, M., & Meloche-Kelly, M. (1989). Prevalence of unsuspected language disorders in a child psychiatric population. Journal of the American Academy of Child & Adolescent Psychiatry, 28(1), 107–111.

    Article  Google Scholar 

  • Colbert, D., Tyndall, I., Roche, B., & Cassidy, S. (2018). Can SMART training really increase intelligence? A replication study. Journal of Behavioral Education, 27(4), 509–531.

    Article  Google Scholar 

  • Curran, P. J., & Hussong, A. M. (2003). The use of latent trajectory models in psychopathology research. Journal of abnormal psychology, 112(4), 526.

    Article  Google Scholar 

  • Dixon, M. R. (2014a). PEAK relational training system: Direct training module. Shawnee Scientific Press.

  • Dixon, M. R. (2014b). PEAK relational training system: Generalization module. Shawnee Scientific Press.

  • Dixon, M. R. (2015). PEAK relaitonal training system: Equivalence module. Shawnee Scientific Press.

  • Dixon, M. R. (2016). PEAK relational training system: Transformation module. Shawnee Scientific Press.

  • Dixon, M. R., Paliliunas, D., Barron, B. F., Schmick, A. M., & Stanley, C. R. (2019). Randomized controlled trial evaluation of ABA content on IQ gains in children with autism. Journal of Behavioral Education, 1–23.

  • Dixon, M. R., Whiting, S. W., Rowsey, K., & Belisly, J. (2014). Assessing the relationship between intelligence and the PEAK relational training system. Research in Autism Spectrum Disorders, 8(9), 1208–1213.

    Article  Google Scholar 

  • Einstein, A. (1905). On the electrodynamics of moving bodies. Annalen der physik, 17(10), 891–921.

    Article  Google Scholar 

  • Fisher, R. A. (1925). Statistical methods for research workers. Oliver and Boyd.

  • Fisher, R. A. (1926). The arrangement of field experiments. Journal of the Ministry of Agriculture of Great Britain, 33, 503–515.

    Google Scholar 

  • Flanagan, D. P., & Alfonso, V. C. (2017). Essentials of WISC-V assessment. . John Wiley & Sons.

    Google Scholar 

  • Francis, D. J., Fletcher, J. M., Stuebing, K. K., Davidson, K. C., & Thompson, N. M. (1992). Analysis of change: Modeling individual growth. Journal of consulting and clinical psychology, 59(1), 27–37.

    Article  Google Scholar 

  • Greenland, S., Senn, S. J., Rothman, K. J., Carlin, J. B., Poole, C., Goodman, S. N., & Altman, D. G. (2016). Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. European journal of epidemiology, 31(4), 337–350.

    Article  Google Scholar 

  • Hayes, S. C., Barnes-Holmes, D., & Roche, B. (2001). Relational frame theory: A post-Skinnerian account of human language and cognition. Springer Science & Business Media.

  • Hessl, D., Nguyen, D. V., Green, C., Chavez, A., Tassone, F., Hagerman, R. J., Senturk, D., Schneider, A., Lightbody, A., & Reiss, A. L. (2009). A solution to limitations of cognitive testing in children with intellectual disabilities: The case of fragile X syndrome. Journal of Neurodevelopmental Disorders, 1(1), 33–45.

    Article  Google Scholar 

  • Howell, D. C. (2013a). Analysis of variance and covariance as general linear models. In Statistical Methods for Psychology 8th Edition (pp. 573–622). Cengage Learning.

  • Howell, D. C. (2013b). The normal distribution. In Statistical Methods for Psychology 8th Edition (pp. 63–82). Cengage Learning.

  • Huck, S. W. (2011). The Analysis of Covariance. In S. W. Huck (Ed.), Reading Statistics and Research Sixth Edition (pp. 343–366). Pearson.

  • Kaufman, A. S. (2018). Contemporary intellectual assessment: Theories, tests, and issues. Guilford Publications.

  • Koegel, L. K., Koegel, R. L., & Smith, A. (1997). Variables related to differences in standardized test outcomes for children with autism. Journal of Autism and Developmental Disorders, 27(3), 233–243.

    Article  Google Scholar 

  • Le Boedec, K. (2016). Sensitivity and specificity of normality tests and consequences on reference interval accuracy at small sample size: A computer-simulation study. Veterinary clinical pathology, 45(4), 648–656.

    Article  Google Scholar 

  • Levy, K. J. (1980). A Monte Carlo study of analysis of covariance under violations of the assumptions of normality and equal regression slopes. Educational and Psychological Measurement, 40(4), 835–840.

    Article  Google Scholar 

  • McKenzie, K., & Murray, A. L. (2015). Evaluating the use of the Child and Adolescent Intellectual Disability Screening Questionnaire (CAIDS-Q) to estimate IQ in children with low intellectual ability. Research in developmental disabilities, 37, 31–36.

    Article  Google Scholar 

  • Michelson, A. A., & Morley, E. W. (1887). On the relative motion of the earth and of the luminiferous ether. Sidereal Messenger, 6, 306–310.

    Google Scholar 

  • Minton, J., Campbell, M., Green, W. H., Jennings, S., & Samit, C. (1982). Cognitive assessment of siblings of autistic children. Journal of the American Academy of Child Psychiatry, 21(3), 256–261.

    Article  Google Scholar 

  • Moore, J. L., Yi, Z., Hinman, J. M., Barron, B. F., & Dixon, M. R. (2020). Examining the convergent validity between the PEAK Relational Training System’s semi-standardized and standardized skill assessments. Journal of Developmental and Physical Disabilities. https://doi.org/10.1007/s10882-020-09771-9.

    Article  Google Scholar 

  • Murdoch, D. J., Tsai, Y.-L., & Adcock, J. (2008). P-values are random variables. The American Statistician, 62(3), 242–245.

    Article  Google Scholar 

  • Proust-Lima, C., Séne, M., Taylor, J. M., & Jacqmin-Gadda, H. (2014). Joint latent class models for longitudinal and time-to-event data: A review. Statistical methods in medical research, 23(1), 74–90.

    Article  Google Scholar 

  • Rosales, R., Rehfeldt, R. A., & Lovett, S. (2011). Effects of multiple exemplar training on the emergence of derived relations in preschool children learning a second language. The Analysis of verbal behavior, 27(1), 61–74.

    Article  Google Scholar 

  • Sansone, S. M., Schneider, A., Bickel, E., Berry-Kravis, E., Prescott, C., & Hessl, D. (2014). Improving IQ measurement in intellectual disabilities using true deviation from population norms. Journal of Neurodevelopmental Disorders, 6(1), 16.

    Article  Google Scholar 

  • Schreiber, J. B. (2020). New paradigms for considering statistical significance: A way forward for health services research journals, their authors, and their readership. Research in Social and Administrative Pharmacy, 16(4), 591–594.

    Article  Google Scholar 

  • Skinner, B. F. (1957). Verbal Behavior. . Appleton-Century-Crofts.

    Book  Google Scholar 

  • Srivastava, A. B. L. (1958). Effect of non-normality on the power function of t-test. Biometrika, 45(3/4), 421–430.

    Article  Google Scholar 

  • Sterling, T. D. (1959). Publication decisions and their possible effects on inferences drawn from tests of significance—or vice versa. Journal of the American statistical association, 54(285), 30–34.

    Google Scholar 

  • Stolley, P. D. (1991). When genius errs: RA Fisher and the lung cancer controversy. American Journal of Epidemiology, 133(5), 416–425.

    Article  Google Scholar 

  • Thatcher, R., McAlaster, R., Lester, M., & Cantor, D. (1984). Comparisons among EEG, hair minerals and diet predictions of reading performance in children. . Annals of the New York Academy of Sciences.

    Book  Google Scholar 

  • Toffalini, E., Buono, S., Zagaria, T., Calcagnì, A., & Cornoldi, C. (2019). Using Z and age-equivalent scores to address WISC-IV floor effects for children with intellectual disability. Journal of Intellectual Disability Research, 63(6), 528–538.

    Article  Google Scholar 

  • Von Rosen, D. (1991). The growth curve model: A review. Communications in Statistics-Theory and Methods, 20(9), 2791–2822.

    Article  Google Scholar 

  • Wechsler, D. (2012). Wechsler Preschool and Primary Scale of Intelligence Fourth Edition. The Psychological Corporation.

  • Wechsler, D. (2014). Wechsler Intelligence Scale for Children Fifth Edition. The Psychological Corporation.

  • Yazici, B., & Yolacan, S. (2007). A comparison of various tests of normality. Journal of Statistical Computation and Simulation, 77(2), 175–183.

    Article  Google Scholar 

  • Ziliak, S. T. (2008). Retrospectives: Guinnessometrics: The Economic Foundation of “Student’s” t. Journal of Economic Perspectives, 22(4), 199–216.

    Article  Google Scholar 

  • Ziliak, S., & McCloskey, D. N. (2008). The cult of statistical significance: How the standard error costs us jobs, justice, and lives. . University of Michigan Press.

    Google Scholar 

Download references

Funding

No funds, grants, or other support was received.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mark R. Dixon.

Ethics declarations

Conflict of interests

Mark R. Dixon receives small royalties from the sales of the PEAK.

Ethical approval

The original study was approved by the Human Subjects Committee at the Southern Illinois University Carbondale (Protocol: 15334).

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Yi, Z., Schreiber, J.B., Paliliunas, D. et al. P < .05 is in the Eye of the Beholder: A Response to Beaujean and Farmer (2020). J Behav Educ 30, 489–511 (2021). https://doi.org/10.1007/s10864-021-09435-4

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10864-021-09435-4

Keywords

  • Relational frame theory
  • PEAK
  • Intelligence
  • Autism