On the Reliability and Accuracy of the Evaluative Method for Identifying Evidence-Based Practices in Autism



The editors of this book recently described the development and application of an “evaluative method” for assessing evidence-based practices (EBP) in Autism (Reichow et al. 2008). The major results of this investigation, which were presented at the International Meeting for Autism Research (Reichow et al. 2007) indicated that the method produced highly reliable and valid results, whether deriving from an assessment of primary or secondary quality indicators from published peer-reviewed, group research reports or from published and peer-reviewed, single subject experimental design (SSED) reports. The levels of inter-examiner agreement, ranged between 85%, with a Kappa or chance-corrected level of 0.69 (Cohen 1960), and 96%, with a Kappa value of 0.93. By applying the criteria of Cicchetti (2001) and Cicchetti et al. (1995), the levels of reliability ranged between good (85%, with a Kappa value of 0.69) and excellent (96%, with a Kappa value equal to 0.93).


Quality Indicator Validity Index Rater Agreement Positive Behavior Support Hypothetical Data 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



Evidence-based practice




Proportion of chance agreement


Predicted negative accuracy


Proportion of observed agreement


Proportion of observed negative agreement


Proportion of observed positive agreement


Predicted positive accuracy


Quality indicator absent


Quality indicator present






Single subject experimental design


  1. Cicchetti, D. V. (1988). When diagnostic agreement is high, but reliability is low: Some paradoxes occurring in joint independent neuropsychology assessments. Journal of Clinical and Experimental Neuropsychology, 10, 605–622.PubMedCrossRefGoogle Scholar
  2. Cicchetti, D. V. (1994). Guidelines, criteria, and rules of thumb for evaluating normed and standardized assessment instruments in psychology. Psychological Assessment, 6, 284–290.CrossRefGoogle Scholar
  3. Cicchetti, D. V. (2001). The precision of reliabi lity and validity estimates re-visited: Distinguishing between clinical and statistical significance of sample size requirements. Journal of Clinical and Experimental Neuropsychology, 23, 695–700.PubMedCrossRefGoogle Scholar
  4. Cicchetti, D. V., & Feinstein, A. R. (1990). High agreement but low Kappa: II. Resolving the paradoxes. Journal of Clinical Epidemiology, 43, 551–568.PubMedCrossRefGoogle Scholar
  5. Cicchetti, D. V., & Sparrow, S. S. (1981). Developing criteria for establishing interrater reliability of specific items: Applications to assessment of adaptive behavior. American Journal of Mental Deficiency, 86, 127–137.PubMedGoogle Scholar
  6. Cicchetti, D. V., Volkmar, F., Klin, A., & Showalter, D. (1995). Diagnosing autism using ICD-10 criteria: A comparison of neural networks and standard multivariate procedures. Child Neuropsychology, 1, 26–37.CrossRefGoogle Scholar
  7. Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 23, 37–46.CrossRefGoogle Scholar
  8. Doehring, P., Reichow, B., & Volkmar, F. R. (2007). Is it evidenced-based? How to evaluate claims of effectiveness for autism. Paper presented at the International Association for Positive Behavior Support Conference, March, Boston, MA.Google Scholar
  9. Feinstein, A. R., & Cicchetti, D. V. (1990). High agreement but low Kappa: I the problem of two paradoxes. Journal of Clinical Epidemiology, 43, 543–549.PubMedCrossRefGoogle Scholar
  10. Fleiss, J. L., Cohen, J., & Everitt, B. S. (1969). Large sample standard errors of kappa and weighted kappa. Psychological Bulletin, 72(5). 323–327.Google Scholar
  11. Fleiss, J. L. (1975). Measuring agreement between two judges on the presence or absence of a trait. Biometrics, 31, 651–659.PubMedCrossRefGoogle Scholar
  12. Fleiss, J. L. (1981). Statistical methods for rates and proportions (2nd ed.). New York: Wiley.Google Scholar
  13. Fleiss, J. L., Levin, B., & Paik, M. C. (2003). Statistical methods for rates and proportions (3rd ed.). New York: Wiley.CrossRefGoogle Scholar
  14. Klin, A., Lang, J., Cicchetti, D. V., & Volkmar, F. (2000). Inter-rater reliability of clinical diagnosis and DSM-IV criteria for autistic disorder: Results of the DSM-IV autism field trial. Journal of Autism and Developmental Disorders, 30, 163–167.PubMedCrossRefGoogle Scholar
  15. Kraemer, H. C. (1982). Estimating false alarms and missed events from interobserver agreement: Comment on Kaye. Psychological Bulletin, 92, 749–754.CrossRefGoogle Scholar
  16. Kratochwill, T. R., & Stoiber, K. C. (2002). Evidence-based interventions in school psychology: Conceptual foundations of the procedural and coding manual of Division 16 and the Society for the Study of School Psychology task force. School Psychology Quarterly, 17, 341–389.CrossRefGoogle Scholar
  17. Landis, J. R., & Koch, G. G. (1977). The measure of observer agreement for categorical data. Biometrics, 3, 159–174.CrossRefGoogle Scholar
  18. Lonigan, C. J., Elbert, J. C., & Johnson, S. B. (1998). Empirically supported psychosocial interventions for children: An overview. Journal of Clinical Child Psychology, 27, 138–145.PubMedCrossRefGoogle Scholar
  19. Lord, C., Bristol-Power, M., Filipek, P. A., Gallagher, J. J., Harris, S. L., et al. (2001). Educating children with autism. Washington, DC: National Academy.Google Scholar
  20. Odom, S. L., Brantlinger, E., Gersten, R., Horner, R. H., Thompson, B., & Harris, K. R. (2005). Research in special education: Scientific methods and evidence-based practices. Exceptional Children, 71, 137–148.Google Scholar
  21. Reichow, B., Barton, E. E., Volkmar, F. R., & Cicchetti, D. V. (2007). The status of research on interventions for young children with autism spectrum disorders. Poster presented at the International Meeting for Autism Research, May, Seattle, WA.Google Scholar
  22. Reichow, B., Volkmar, F. R., & Cicchetti, D. V. (2008). Development of an evaluative method for determining the strength of research evidence in autism. Journal of Autism and Developmental Disorders, 38, 1311–1319.PubMedCrossRefGoogle Scholar
  23. Youden, W. J. (1950). Index for rating diagnostic tests. Cancer, 3, 32–35.PubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  1. 1.Yale Child Study CenterYale School of MedicineNew HavenUSA

Personalised recommendations