Journal of Behavioral Education

, Volume 10, Issue 4, pp 205–212 | Cite as

Interobserver Agreement in Behavioral Research: Importance and Calculation

  • Marley W. Watkins
  • Miriam Pacheco


Behavioral researchers have developed a sophisticated methodology to evaluate behavioral change which is dependent upon accurate measurement of behavior. Direct observation of behavior has traditionally been the mainstay of behavioral measurement. Consequently, researchers must attend to the psychometric properties, such as interobserver agreement, of observational measures to ensure reliable and valid measurement. Of the many indices of interobserver agreement, percentage of agreement is the most popular. Its use persists despite repeated admonitions and empirical evidence indicating that it is not the most psychometrically sound statistic to determine interobserver agreement due to its inability to take chance into account. Cohen's (1960) kappa has long been proposed as the more psychometrically sound statistic for assessing interobserver agreement. Kappa is described and computational methods are presented.

interobserver agreement kappa interrater reliability observer agreement 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. American Psychological Association, American Educational Research Association, and National Council on Measurement in Education. (1985). Standards for educational and psychological testing. Washington, DC: American Psychological Association.Google Scholar
  2. Baer, D. M. (1977). Reviewer's comment: Just because it's reliable doesn't mean that you can use it. Journal of Applied Behavior Analysis, 10, 117–119.Google Scholar
  3. Berk, R. A. (1979). Generalizability of behavioral observations: A clarification of interobserver agreement and interobserver reliability. American Journal of Mental Deficiency, 83, 460–472.Google Scholar
  4. Cicchetti, D. V. (1994). Guidelines, criteria, and rules of thumb for evaluating normed and standardized assessment instruments in psychology. Psychological Assessment, 6, 284–290.Google Scholar
  5. Ciminero, A. R., Calhoun, K. S., & Adams, H. E. (Eds.). (1986). Handbook of behavioral assessment (2nd ed.). New York: Wiley.Google Scholar
  6. Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 37–46.Google Scholar
  7. Cone, J. D. (1977). The relevance of reliability and validity for behavioral assessment. Behavior Therapy, 8, 411–426.Google Scholar
  8. Cone, J. D. (1988). Psychometric considerations and the multiple models of behavioral assessment. In A. S. Bellack & M. Hersen (Eds.), Behavioral assessment: A practical handbook (3rd Edition). NY: Pergamon.Google Scholar
  9. Dunn, G., & Everitt, B. (1995). Clinical biostatistics: An introduction to evidence-based medicine. London: Edward Arnold.Google Scholar
  10. Everitt, B. S. (1994). Statistical methods for medical investigations (2nd Edition). NY: Halsted Press.Google Scholar
  11. Fleiss, J. L. (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin, 76, 378–382.Google Scholar
  12. Fleiss, J. L. (1981). Statistical methods for rates and proportions. NY: Wiley.Google Scholar
  13. Foster, S. L., BellDolan, D. J., & Burge, D. A. (1988). Behavioral observation. In A. S. Bellack & M. Hersen (Eds.), Behavioral assessment: A practical handbook (3rd Edition). NY: Pergamon.Google Scholar
  14. Gresham, F. M. (1998). Designs for evaluating behavior change. In T. S. Watson & F. M. Gresham (Eds.), Handbook of child behavior therapy. NY: Plenum.Google Scholar
  15. Hartmann, D. P. (1977, Spring). Considerations in the choice of interobserver reliability estimates. Journal of Applied Behavior Analysis, 10, 103–116.Google Scholar
  16. Hoge, R. D. (1985). The validity of direct observation measures of pupil classroom behavior. Review of Educational Research, 55, 469–483.Google Scholar
  17. Hops, H., Davis, B., & Longoria, N. (1995). Methodological issues in direct observation: Illustrations with the living in familial environments (LIFE) coding system. Journal of Clinical Child Psychology, 24, 193–203.Google Scholar
  18. Johnson, J. M., & Pennypacker, H. S. (1993). Strategies and tactics of human behavioral research (2nd Edition). Hillsdale, NJ: Erlbaum.Google Scholar
  19. Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33, 159–174.Google Scholar
  20. Langenbucher, J., Labouvie, E., & Morgenstern, J. (1996). Methodological developments: Measuring diagnostic agreement. Journal of Consulting and Clinical Psychology, 64, 1285–1289.Google Scholar
  21. McDermott, P. A. (1988). Agreement among diagnosticians or observers: Its importance and determination. Professional School Psychology, 3, 225–240.Google Scholar
  22. Nelson, L. D., & Cicchetti, D. V. (1995). Assessment of emotional functioning in brainimpaired individuals. Psychological Assessment, 7, 404–413.Google Scholar
  23. Shrout, P. E., Spitzer, R. L.,& Fleiss, J. L. (1987). Comment: Quantification of agreement in psychiatric diagnosis revisited. Archives of General Psychiatry, 44, 172–178.Google Scholar
  24. Suen, H. K. (1988). Agreement, reliability, accuracy, and validity: Toward a clarification. Behavioral Assessment, 10, 343–366.Google Scholar
  25. Suen, H. K., & Lee, P. S. (1985). Effects of the use of percentage agreement on behavioral observation reliabilities: A reassessment. Journal of Psychopathology and Behavioral Assessment, 7, 221–234.Google Scholar
  26. Wasik, B. H., & Loven, M. D. (1980). Classroom observational data: Sources of inaccuracy and proposed solutions. Behavioral Assessment, 2, 211–227.Google Scholar
  27. Watkins, M. W. (1988). MacKappa [Computer software]. Pennsylvania State University: Author.Google Scholar

Copyright information

© Human Sciences Press, Inc. 2000

Authors and Affiliations

  • Marley W. Watkins
    • 1
  • Miriam Pacheco
  1. 1.Department of Educational and School PsychologyThe Pennsylvania State University

Personalised recommendations