Advertisement

Psychometrika

, Volume 77, Issue 2, pp 315–323 | Cite as

Some Paradoxical Results for the Quadratically Weighted Kappa

  • Matthijs J. Warrens
Article

Abstract

The quadratically weighted kappa is the most commonly used weighted kappa statistic for summarizing interrater agreement on an ordinal scale. The paper presents several properties of the quadratically weighted kappa that are paradoxical. For agreement tables with an odd number of categories n it is shown that if one of the raters uses the same base rates for categories 1 and n, categories 2 and n−1, and so on, then the value of quadratically weighted kappa does not depend on the value of the center cell of the agreement table. Since the center cell reflects the exact agreement of the two raters on the middle category, this result questions the applicability of the quadratically weighted kappa to agreement studies. If one wants to report a single index of agreement for an ordinal scale, it is recommended that the linearly weighted kappa instead of the quadratically weighted kappa is used.

Key words

Cohen’s kappa weighted kappa nominal agreement ordinal agreement agreement studies radiology quadratic weights 

Notes

Acknowledgements

The author thanks three anonymous reviewers for their helpful comments and valuable suggestions on an earlier versions of this article. This research is part of project 451-11-026 funded by the Netherlands Organisation for Scientific Research.

References

  1. Agresti, A. (1988). A model for agreement between ratings on an ordinal scale. Biometrics, 44, 539–548. CrossRefGoogle Scholar
  2. Agresti, A. (2010). Analysis of ordinal categorical data (2nd ed.). Hoboken: Wiley. Google Scholar
  3. Becker, M.P. (1989). Using association models to analyse agreement data: two examples. Statistics in Medicine, 8, 1199–1207. PubMedCrossRefGoogle Scholar
  4. Brennan, R.L., & Prediger, D.J. (1981). Coefficient kappa: some uses, misuses, and alternatives. Educational and Psychological Measurement, 41, 687–699. CrossRefGoogle Scholar
  5. Brenner, H., & Kliebsch, U. (1996). Dependence of weighted kappa coefficients on the number of categories. Epidemiology, 7, 199–202. PubMedCrossRefGoogle Scholar
  6. Cicchetti, D., & Allison, T. (1971). A new procedure for assessing reliability of scoring EEG sleep recordings. The American Journal of EEG Technology, 11, 101–109. Google Scholar
  7. Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 213–220. CrossRefGoogle Scholar
  8. Cohen, J. (1968). Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin, 70, 213–220. PubMedCrossRefGoogle Scholar
  9. Crewson, P.E. (2005). Fundamentals of clinical research for radiologists: reader agreement studies. American Journal of Roentgenology, 184, 1391–1397. PubMedGoogle Scholar
  10. Fleiss, J.L., & Cohen, J. (1973). The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educational and Psychological Measurement, 33, 613–619. CrossRefGoogle Scholar
  11. Graham, P., & Jackson, R. (1993). The analysis of ordinal agreement data: beyond weighted kappa. Journal of Clinical Epidemiology, 46, 1055–1062. PubMedCrossRefGoogle Scholar
  12. Hsu, L.M., & Field, R. (2003). Interrater agreement measures: comments on kappan, Cohen’s kappa, Scott’s π and Aickin’s α. Understanding Statistics, 2, 205–219. CrossRefGoogle Scholar
  13. Jakobsson, U., & Westergren, A. (2005). Statistical methods for assessing agreement for ordinal data. Scandinavian Journal of Caring Sciences, 19, 427–431. PubMedCrossRefGoogle Scholar
  14. Kundel, H.L., & Polansky, M. (2003). Measurement of observer agreement. Radiology, 288, 303–308. CrossRefGoogle Scholar
  15. Maclure, M., & Willett, W.C. (1987). Misinterpretation and misuse of the kappa statistic. American Journal of Epidemiology, 126, 161–169. PubMedCrossRefGoogle Scholar
  16. Schuster, C. (2004). A note on the interpretation of weighted kappa and its relations to other rater agreement statistics for metric scales. Educational and Psychological Measurement, 64, 243–253. CrossRefGoogle Scholar
  17. Tanner, M.A., & Young, M.A. (1985). Modeling ordinal scale agreement. Psychological Bulletin, 98, 408–415. PubMedCrossRefGoogle Scholar
  18. Vanbelle, S., & Albert, A. (2009a). Agreement between two independent groups of raters. Psychometrika, 74, 477–491. CrossRefGoogle Scholar
  19. Vanbelle, S., & Albert, A. (2009b). A note on the linearly weighted kappa coefficient for ordinal scales. Statistical Methodology, 6, 157–163. CrossRefGoogle Scholar
  20. Warrens, M.J. (2008a). On the equivalence of Cohen’s kappa and the Hubert-Arabie adjusted Rand index. Journal of Classification, 25, 177–183. CrossRefGoogle Scholar
  21. Warrens, M.J. (2008b). On similarity coefficients for 2×2 tables and correction for chance. Psychometrika, 73, 487–502. PubMedCrossRefGoogle Scholar
  22. Warrens, M.J. (2010a). Inequalities between kappa and kappa-like statistics for k×k tables. Psychometrika, 75, 176–185. CrossRefGoogle Scholar
  23. Warrens, M.J. (2010b). A formal proof of a paradox associated with Cohen’s kappa. Journal of Classification, 27, 322–332. CrossRefGoogle Scholar
  24. Warrens, M.J. (2010c). Cohen’s kappa can always be increased and decreased by combining categories. Statistical Methodology, 7, 673–677. CrossRefGoogle Scholar
  25. Warrens, M.J. (2011a). Weighted kappa is higher than Cohen’s kappa for tridiagonal agreement tables. Statistical Methodology, 8, 268–272. CrossRefGoogle Scholar
  26. Warrens, M.J. (2011b). Cohen’s linearly weighted kappa is a weighted average of 2×2 kappas. Psychometrika, 76, 471–486. CrossRefGoogle Scholar
  27. Warrens, M.J. (2012a). Cohen’s quadratically weighted kappa is higher than linearly weighted kappa for tridiagonal agreement tables. Statistical Methodology, 9, 440–444. CrossRefGoogle Scholar
  28. Warrens, M.J. (2012b, in press). Cohen’s linearly weighted kappa is a weighted average. Advances in Data Analysis and Classification. Google Scholar
  29. Zwick, R. (1988). Another look at interrater agreement. Psychological Bulletin, 103, 374–378. PubMedCrossRefGoogle Scholar

Copyright information

© The Psychometric Society 2012

Authors and Affiliations

  1. 1.Institute of Psychology, Unit Methodology and StatisticsLeiden UniversityLeidenThe Netherlands

Personalised recommendations