Journal of Autism and Developmental Disorders

, Volume 41, Issue 2, pp 168–174 | Cite as

From Bayes Through Marginal Utility to Effect Sizes: A Guide to Understanding the Clinical and Statistical Significance of the Results of Autism Research Findings

  • Domenic V. CicchettiEmail author
  • Kathy Koenig
  • Ami Klin
  • Fred R. Volkmar
  • Rhea Paul
  • Sara Sparrow
Original Paper


The objectives of this report are: (a) to trace the theoretical roots of the concept clinical significance that derives from Bayesian thinking, Marginal Utility/Diminishing Returns in Economics, and the “just noticeable difference”, in Psychophysics. These concepts then translated into: Effect Size (ES), strength of agreement, clinical significance, and related concepts, and made possible the development of Power Analysis; (b) to differentiate clinical significance from statistical significance; and (c) to demonstrate the utility of measures of ES and related concepts for enhancing the meaning of Autism research findings. These objectives are accomplished by applying criteria for estimating clinical significance, and related concepts, to a number of areas of autism research.


Clinical significance in autism research 


  1. Bartko, J. J. (1966). The intraclass correlation coefficient as a measure of reliability. Psychological Reports, 19, 3–11.PubMedGoogle Scholar
  2. Bartko, J. J. (1974). Corrective note to “the intraclass correlation coefficient as a measure of reliability”. Psychological Reports, 34, 418.Google Scholar
  3. Bayes, T. (1763). “An essay, by the late Reverend Mr. Bayes, F.R.S. communicated by Mr. Price, in a letter to John Canton, A.M.F.R.S. Philosophical Transactions, giving some account of the present undertakings, studies and labours of the ingenious in many considerable parts of the world, vol 53, 370–418.Google Scholar
  4. Bolanowski, S. J., Jr., & Gescheider, G. A. (Eds.). (1991). Ratio scaling of psychological magnitude: In honor of the memory of S.S. Stevens. Hillsdale, NJ: Lawrence Erlbaum Associates.Google Scholar
  5. Borenstein, M. (1998). The shift from significance testing to effect size estimation. In A. S. Bellak & M. Hershen (Series Eds.) & N. Schooler (Vol. Ed.), Research and methods: Comprehensive clinical psychology (Vol. 3, pp. 313–349). New York, NY: Pergamon.Google Scholar
  6. Borenstein, M., Rothstein, H., & Cohen, J. (2001). Power and precision: A computer program for statistical power analysis and confidence intervals. Englewood, NJ: Biostat, Inc.Google Scholar
  7. Cicchetti, D. V. (1988). When diagnostic agreement is high, but reliability is low: Some paradoxes occurring in joint independent neuropsychology assessments. Journal of Clinical and Experimental Neuropsychology, 10, 605–622.CrossRefPubMedGoogle Scholar
  8. Cicchetti, D. V. (1994). Guidelines, criteria, and rules of thumb for evaluating normed and standardized assessment instruments in psychology. Psychological Assessment, 6, 284–290.CrossRefGoogle Scholar
  9. Cicchetti, D. V. (2001). The precision of reliability and validity estimates re-visited: istinguishing between clinical and statistical significance of sample size requirements. Journal of Clinical and Experimental Neuropsychology, 23, 695–700.CrossRefPubMedGoogle Scholar
  10. Cicchetti, D. V. (2008). From Bayes to the just noticeable difference to effect sizes: A note to understanding the clinical and statistical significance of oenologic research findings. Journal of Wine Economics, 3, 185–193.CrossRefGoogle Scholar
  11. Cicchetti, D. V., Bronen, R., Spencer, S., Haut, S., Berg, A., Oliver, P., et al. (2006). Rating scales, scales of measurement, issues of reliability: Resolving some critical issues for clinicians and researchers. Journal of Nervous and Mental Disease, 194, 557–564.CrossRefPubMedGoogle Scholar
  12. Cicchetti, D. V., Lord, C., Koenig, K., Klin, A., & Volkmar, F. (2008). Reliability of the ADI-R: Multiple examiners evaluate a single case. Journal of Autism and Developmental Disorders, 38, 764–770.CrossRefPubMedGoogle Scholar
  13. Cicchetti, D. V., & Sparrow, S. S. (1981). Developing criteria for establishing interrater reliability of specific items: Applications to assessment of adaptive behavior. American Journal of Mental Deficiency, 86, 127–137.PubMedGoogle Scholar
  14. Cicchetti, D. V., & Sparrow, S. S. (1990). Assessment of adaptive behavior in young children. In J. J. Johnson & J. Goldman (Eds.), Developmental assessment in clinical child psychology: A handbook (chap. 8 (pp. 173–196). New York: Pergamon.Google Scholar
  15. Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 23, 37–46.CrossRefGoogle Scholar
  16. Cohen, J. (1965). Some statistical issues in psychological research. In B.B. Wolman (Ed.). Handbook of clinical psychology. Google Scholar
  17. Cohen, J. (1968). Weighted kappa: Nominal scale agreement with provision for partial credit. Psychological Bulletin, 70, 213–220.CrossRefPubMedGoogle Scholar
  18. Cohen, J. (1977). Statistical power analysis for the behavioral sciences. New York, NY: Academic Press.Google Scholar
  19. Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Glendale, NJ: Lawrence Erlbaum, Associates.Google Scholar
  20. Cronbach, L. J. (1950). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297–334.CrossRefGoogle Scholar
  21. Durlak, J. A. (2009). How to select, calculate and interpret Effect Sizes. Journal of Pediatric Psychology, 34, 917–928.CrossRefPubMedGoogle Scholar
  22. Fechner, G. (1907). Elemente der Psychophysik I u. II Leipsig. Germany: Breitkopf & Hartel.Google Scholar
  23. Finch, S., & Cumming, G. (2009). Putting research in context: Understanding, confidence intervals from one or more studies. Journal of Pediatric Psychlogy, 34, 903–916.CrossRefGoogle Scholar
  24. Fleiss, J. L. (1975). Measuring agreement between two judges on the presence or absence of a trait. Biometrics, 31, 651–659.CrossRefPubMedGoogle Scholar
  25. Fleiss, J. L., & Cohen, J. (1973). The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educational and Psychological Measurement, 33, 613–619.CrossRefGoogle Scholar
  26. Klin, A., Lang, J., Cicchetti, D. V., & Volkmar, F. (2000). Inter-rater reliability of clinical diagnosis and DSM-IV criteria for autistic disorder: Results of the DSM-IV autism field trial. Journal of Autism and Developmental Disorders, 30, 163–167.CrossRefPubMedGoogle Scholar
  27. Kraemer, H. C., Morgan, G. H., Leech, N. L., Gliner, J. A., Vaske, J. J., & Harmon, R. J. (2003). Measures of clinical significance. Journal of the American Academy of Child and Adolescent Psychiatry, 42, 1524–1529.CrossRefPubMedGoogle Scholar
  28. Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33, 159–174.CrossRefPubMedGoogle Scholar
  29. Laupacis, A., Sackett, D. L., & Roberts, R. S. (1988). An assessment of clinically useful measures of the consequences of treatment. New England Journal of Medicine, 318, 1728–1733.CrossRefPubMedGoogle Scholar
  30. Neyman, J., & Pearson, E. S. (1928). On the use and interpretation of certain test criteria for purposes of statistical inference. Biometrika, 20A, 175–240. and 263–294.Google Scholar
  31. Neyman, J., & Pearson, E. S. (1933). On the problem of the most efficient tests of statistical hypotheses. Transactions of the Royal Society of London Series A, 231, 289–337.CrossRefGoogle Scholar
  32. Nunnally, J. C. (1978). Psychometric theory. New York, NY: McGraw-Hill.Google Scholar
  33. Paul, R., Chawarska, K., Cicchetti, D., & Volkmar, F. (2008). Language outcomes of toddlers with autism spectrum disorders: A two year follow-up. Autism Research, 1(2), 97–107.CrossRefPubMedGoogle Scholar
  34. Paul, R., Miles-Orlovsky, S., Marcinko, H. C., & Volkmar, F. (2010). Conversational behaviors in youth with high-functioning ASD and Asperger Syndrome. Journal of Autism and Developmental Disorders, 39, 115–125.CrossRefGoogle Scholar
  35. Rosenthal, R. (1991). Meta-analytic procedures for social research. Applied Social Research Methods Series, 6, 1–155.Google Scholar
  36. Sparrow, S. S., Cicchetti, D. V., & Balla, D. A. (2005). Vineland II: A revision of the vineland adaptive behavior scales: I. Survey/caregiver form (2nd edn). Circle Pines, Minnesota: American Guidance Service.Google Scholar
  37. Sparrow, S. S., Cicchetti, D. V., & Balla, D. A. (2008). Vineland II: A revision of the vineland adaptive behavior scales: II. Expanded form (2nd edn). Circle Pines, Minnesota: American Guidance Service.Google Scholar
  38. Stevens, S. S. (1946). On the theory of scales of measurement. Science, 10, 677–680.CrossRefGoogle Scholar
  39. Stevens, S. S. (1951). Mathematics, measurement, and psychophysics. In Stevens, S. S. (Ed.). Handbook of experimental psychology, chap. 1 (pp. 1–49). New York, NY: Wiley.Google Scholar
  40. Stevens, S. S. (1968). Measurement, statistics, and the schemapiric view. Science, 161, 849–856.CrossRefPubMedGoogle Scholar
  41. Stone, H., & Sidel, J.l. (Eds.). (1993). Sensory evaluation practices (2nd ed.). New York, NY: Academic Press.Google Scholar
  42. Von Wieser, F. (1893). Natural value (English ed ed.). New York, NY: MacMillan.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  • Domenic V. Cicchetti
    • 1
    Email author
  • Kathy Koenig
    • 1
  • Ami Klin
    • 1
  • Fred R. Volkmar
    • 1
  • Rhea Paul
    • 1
  • Sara Sparrow
    • 1
  1. 1.Child Study CenterYale University School of MedicineNew HavenUSA

Personalised recommendations