Effects of the use of percentage agreement on behavioral observation reliabilities: A reassessment

  • Hoi K. Suen
  • Patrick S. C. Lee


The percentage agreement index has been and continues to be a popular measure of interobserver reliability in applied behavior analysis and child development, as well as in other fields in which behavioral observation techniques are used. An algebraic method and a linear programming method were used to assess chance-corrected reliabilities for a sample of past observations in which the percentage agreement index was used. The results indicated that, had kappa been used instead of percentage agreement, between one-fourth and three-fourth of the reported observations could be judged as unreliable against a lenient criterion and between one-half and three-fourths could be judged as unreliable against a more stringent criterion. It is suggested that the continued use of the percentage agreement index has seriously undermined the reliabilities of past observations and can no longer be justified in future studies.

Key words

reliability agreement scores interobserver reliability 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Armitage, P., Blendis, L. M., & Smyllie, H. C. (1966). The measurement of observer disagreement in the recording of signs.Journal of the Royal Statistical Society, Series A, 129, 98–109.Google Scholar
  2. Ary, D., & Suen, H. K. (1985). Test of Statistical Significance for Interobserver Agreement.Midwestern Educational Researcher, 6, 31–33.Google Scholar
  3. Baer, D. M. (1977). Reviewer's comment: Just because it's reliable doesn't mean that you can use it.Journal of Applied Behavior Analysis, 10, 117–119.Google Scholar
  4. Berk, R. A. (1979). Generalizability of behavioral observations: a clarification of interobserver agreement and interobserver reliability.American Journal of Mental Deficiency, 83, 460–472.Google Scholar
  5. Charnes, A., & Cooper, W. (1961). Fractional programming solvable as linear programs.Naval Research Logistics Quarterly. Google Scholar
  6. Cohen, J. (1960). A coefficient of agreement for nominal scales.Educational and Psychological Measurement, 20, 37–46.Google Scholar
  7. Dantzig, G. B. (1963).Linear programming and extensions. Princeton, NJ: Princeton University Press.Google Scholar
  8. Fleiss, J. L. (1975). Measuring agreement between two judges on the presence or absence of a trait.Biometrics, 31, 651–659.Google Scholar
  9. Gelfand, D. M., & Hartmann, D. P. (1975).Child behavior analysis and therapy. New York: Pergamon.Google Scholar
  10. Goodman, L. A., & Kruskal, W. H. (1954). Measures of association for cross classifications.Journal of the American Statistical Association, 49, 732–764.Google Scholar
  11. Hartmann, D. P. (1977). Considerations in the choice of interobserver reliability estimates.Journal of Applied Behavior Analysis, 10, 103–116.Google Scholar
  12. Hartmann, D. P. (1982). Assessing the dependability of observational data. In D. P. Hartmann (Ed.),Using observers to study behavior. San Francisco, CA: Jossey-Bass.Google Scholar
  13. Hartmann, D. P., & Gardner, W. (1982). A cautionary note on the use of probability values to estimate interobserver agreement.Journal of Applied Behavior Analysis, 15, 189–190.Google Scholar
  14. Hartmann, D. P., & Wood, D. D. (1982). Observational methods. In A. Bellack, M. Herzen, & A. E. Kazdin (Eds.),International handbook of behavior modification and therapy. New York: Plenum.Google Scholar
  15. Hawkins, R. P., & Dotson, V. A. (1975). Reliability scores that delude: An Alice in Wonderland trip through the misleading characteristics of interobserver agreement scores in interval recording. In E. Bamp & G. Semb (Eds.),Behavior analysis: Areas of research and application. Englewwod Cliff, NJ: Prentice-Hall.Google Scholar
  16. House, A. E., Farber, J., & Nier, L. L. (1980). Accuracy and speed of reliability calculation using different measures of interobserver agreement. Paper presented at the Association for Advancement of Behavior Therapy Convention, New York, Nov.Google Scholar
  17. Kelly, M. B. (1977). A review of the observational data-collection and reliability procedures reported in the Journal of Applied Behavior Analysis.Journal of Applied Behavior Analysis, 10, 97–101.Google Scholar
  18. Kratochwill, T. R., & Wetzel, R. J. (1977). Observer agreement, credibility, and judgment: Some considerations in presenting observer agreement data.Journal of Applied Behavior Analysis, 10, 133–139.Google Scholar
  19. Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data.Biometrics, 33, 159–174.Google Scholar
  20. Lee, P. S. C., & Suen, H. K. (1984). The estimation of kappa from percentage agreement interobserver reliability.Behavioral Assessment, 6, 375–378.Google Scholar
  21. Lewin, L. M., & Wakefield, J. A. (1979). Percentage agreement and phi: A conversion table.Journal of Applied Behavior Analysis, 12, 299–301.Google Scholar
  22. Mitchell, S. K. (1979). Interobserver agreement, reliability, and generalizability of data collected in observational studies.Psychological Bulletin, 86, 376–390.Google Scholar
  23. Nagel, S. S., & Neef, M. (1976).Operations research methods. Beverly Hills, CA: Sage.Google Scholar
  24. Oravecz, M. T., Thomas, F. B., & Newman, I. (1983). Sample size as a function of several variables. Paper presented at the annual meeting of the Midwestern Educational Research Association, Kansas City, MO, Sept.Google Scholar
  25. Schrage, L. (1981a).Linear programming models with LINDO. Palo Alto, CA: Scientific Press.Google Scholar
  26. Schrage, L. (1981b).User's manual for LINDO. Palo Alto, CA: Scientific Press.Google Scholar
  27. Suen, H. K., & Ary, D. (1984). Simplified conversion of percentage agreement to phi.Behavioral Assessment, 6, 283–284.Google Scholar
  28. Suen, H. K., Ary, D., & Ary, R. (1985). A note on the relationship among eight indices of interobserver agreement.Psychopathology and Behavioral Assessment (in press).Google Scholar
  29. Wakefield, J. A.,Jr. (1980). Relationship between two expressions of reliability: Percentage agreement and phi.Educational and Psychological Measurement, 40, 593–597.Google Scholar

Copyright information

© Plenum Publishing Corporation 1985

Authors and Affiliations

  • Hoi K. Suen
    • 1
  • Patrick S. C. Lee
    • 2
  1. 1.Special ProjectsNorthern Illinois UniversityDeKalb
  2. 2.The Pennsylvania State UniversityUniversity Park

Personalised recommendations