Cohen’s linearly weighted kappa is a weighted average

  • Matthijs J. WarrensEmail author
Open Access
Regular Article


An n × n agreement table F = {f ij } with n ≥ 3 ordered categories can for fixed m (2 ≤ m ≤ n − 1) be collapsed into \({\binom{n-1}{m-1}}\) distinct m × m tables by combining adjacent categories. It is shown that the components (observed and expected agreement) of Cohen’s weighted kappa with linear weights can be obtained from the m × m subtables. A consequence is that weighted kappa with linear weights can be interpreted as a weighted average of the linearly weighted kappas corresponding to the m × m tables, where the weights are the denominators of the kappas. Moreover, weighted kappa with linear weights can be interpreted as a weighted average of the linearly weighted kappas corresponding to all nontrivial subtables.


Cohen’s kappa Inter-rater agreement Merging categories Linear weights Quadratic weights Subtables 

Mathematics Subject Classification (2010)

62H20 62P10 62P15 



The author thanks four anonymous reviewers for their helpful comments and valuable suggestions on an earlier version of this article.

Open Access

This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.


  1. Abramowitz M, Stegun IA (1970) Handbook of mathematical functions (with formulas, graphs and mathematical tables). Dover Publications, New YorkGoogle Scholar
  2. Agresti A (1990) Categorical data analysis. Wiley, New YorkzbMATHGoogle Scholar
  3. Berry KJ, Mielke PW (1988) A generalization of Cohen’s kappa agreement measure to interval measurement and multiple raters. Educ Psychol Meas 48: 921–933CrossRefGoogle Scholar
  4. Brennan RL, Prediger DJ (1981) Coefficient kappa: Some uses, misuses, and alternatives. Educ Psychol Meas 41: 687–699CrossRefGoogle Scholar
  5. Brenner H, Kliebsch U (1996) Dependence of weighted kappa coefficients on the number of categories. Epidemiology 7: 199–202CrossRefGoogle Scholar
  6. Cicchetti D, Allison T (1971) A new procedure for assessing reliability of scoring EEG sleep recordings. Am J EEG Technol 11: 101–109Google Scholar
  7. Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Meas 20: 213–220CrossRefGoogle Scholar
  8. Cohen J (1968) Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit. Psychol Bull 70: 213–220CrossRefGoogle Scholar
  9. Conger AJ (1980) Integration and generalization of kappas for multiple raters. Psychol Bull 88: 322–328CrossRefGoogle Scholar
  10. Fleiss JL, Cohen J (1973) The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educ Psychol Meas 33: 613–619CrossRefGoogle Scholar
  11. Holmquist NS, McMahon CA, Williams EO (1968) Variability in classification of carcinoma in situ of the uterine cervix. Obstet Gynecol Surv 23: 580–585CrossRefGoogle Scholar
  12. Hsu LM, Field R (2003) Interrater agreement measures: Comments on kappan, Cohen’s kappa, Scott’s π and Aickin’s α. Underst Stat 2: 205–219CrossRefGoogle Scholar
  13. Jakobsson U, Westergren A (2005) Statistical methods for assessing agreement for ordinal data. Scand J Caring Sci 19: 427–431CrossRefGoogle Scholar
  14. Kraemer HC, Periyakoil VS, Noda A (2004) Tutorial in biostatistics: Kappa coefficients in medical research. Stat Med 21: 2109–2129CrossRefGoogle Scholar
  15. Kundel HL, Polansky M (2003) Measurement of observer agreement. Radiology 288: 303–308CrossRefGoogle Scholar
  16. Landis JR, Koch GG (1977) An application of hierarchical kappa-type statistics in the assessment of majority agreement among multiple observers. Biometrics 33: 363–374MathSciNetzbMATHCrossRefGoogle Scholar
  17. Mielke PW, Berry KJ, Johnston JE (2007) The exact variance of weighted kappa with multiple raters. Psychol Rep 101: 655–660Google Scholar
  18. Mielke PW, Berry KJ, Johnston JE (2008) Resampling probability values for weighted kappa with multiple raters. Psychol Rep 102: 606–613CrossRefGoogle Scholar
  19. Nelson JC, Pepe MS (2000) Statistical description of interrater variability in ordinal ratings. Stat Methods Med Res 9: 475–496zbMATHCrossRefGoogle Scholar
  20. Schouten HJA (1986) Nominal scale agreement among observers. Psychometrika 51: 453–466MathSciNetCrossRefGoogle Scholar
  21. Schuster C (2004) A note on the interpretation of weighted kappa and its relations to other rater agreement statistics for metric scales. Educ Psychol Meas 64: 243–253MathSciNetCrossRefGoogle Scholar
  22. Vanbelle S, Albert A (2009a) Agreement between two independent groups of raters. Psychometrika 74: 477–491MathSciNetzbMATHCrossRefGoogle Scholar
  23. Vanbelle S, Albert A (2009b) Agreement between an isolated rater and a group of raters. Stat Neerlandica 63: 82–100MathSciNetCrossRefGoogle Scholar
  24. Vanbelle S, Albert A (2009c) A note on the linearly weighted kappa coefficient for ordinal scales. Stat Methodol 6: 157–163MathSciNetzbMATHCrossRefGoogle Scholar
  25. Warrens MJ (2008a) On the equivalence of Cohen’s kappa and the Hubert-Arabie adjusted Rand index. J Classif 25: 177–183MathSciNetzbMATHCrossRefGoogle Scholar
  26. Warrens MJ (2008b) On similarity coefficients for 2 × 2 tables and correction for chance. Psychometrika 73: 487–502MathSciNetCrossRefGoogle Scholar
  27. Warrens MJ (2010a) Inequalities between kappa and kappa-like statistics for k × k tables. Psychometrika 75: 176–185MathSciNetzbMATHCrossRefGoogle Scholar
  28. Warrens MJ (2010b) A formal proof of a paradox associated with Cohen’s kappa. J Classif 27: 322–332MathSciNetCrossRefGoogle Scholar
  29. Warrens MJ (2010c) Inequalities between multi-rater kappas. Adv Data Anal Classif 4: 271–286MathSciNetCrossRefGoogle Scholar
  30. Warrens MJ (2010d) A Kraemer-type rescaling that transforms the odds ratio into the weighted kappa coefficient. Psychometrika 75: 328–330MathSciNetzbMATHCrossRefGoogle Scholar
  31. Warrens MJ (2010e) Cohen’s kappa can always be increased and decreased by combining categories. Stat Methodol 7: 673–677MathSciNetzbMATHCrossRefGoogle Scholar
  32. Warrens MJ (2011a) Weighted kappa is higher than Cohen’s kappa for tridiagonal agreement tables. Stat Methodol 8: 268–272MathSciNetzbMATHCrossRefGoogle Scholar
  33. Warrens MJ (2011b) Cohen’s linearly weighted kappa is a weighted average of 2 × 2 kappas. Psychometrika 76: 471–486MathSciNetzbMATHCrossRefGoogle Scholar
  34. Warrens MJ (2011c) Cohen’s quadratically weighted kappa is higher than linearly weighted kappa for tridiagonal agreement tables. Stat Methodol (in press)Google Scholar
  35. Warrens MJ (2011d) Cohen’s kappa is weighted average. Stat Methodol (in press)Google Scholar
  36. Zwick R (1988) Another look at interrater agreement. Psychol Bull 103: 374–378MathSciNetCrossRefGoogle Scholar

Copyright information

© The Author(s) 2011

Authors and Affiliations

  1. 1.Department of Methodology and StatisticsTilburg UniversityTilburgThe Netherlands

Personalised recommendations