Cohen’s weighted kappa with additive weights

  • Matthijs J. Warrens
Regular Article


Cohen’s weighted kappa is a popular descriptive statistic for summarizing interrater agreement on an ordinal scale. An agreement table with \(n\in \mathbb N _{\ge 3}\) ordered categories can be collapsed into \(n-1\) distinct \(2\times 2\) tables by combining adjacent categories. Weighted kappa with linear weights is a weighted average of the kappas corresponding to the \(2\times 2\) tables, where the weights are the denominators of the \(2\times 2\) kappas. It is shown that the linearly weighted kappa is a special case of a more general weighted kappa that is a weighted average of the \(2\times 2\) kappas. This weighted kappa has additive weights, that is, given initial weights for pairs of adjacent categories the weight for two non-adjacent categories is obtained by adding the weights of all pairs of adjacent categories between the two.


Cohen’s kappa Combining categories Linear weights  Quadratic weights \(2\times 2\) Tables Glasgow outcome scale 

Mathematics Subject Classification (2010)

62H20 62P10 62P15 



This research is part of project 451-11-026 funded by the Netherlands Organisation for Scientific Research. The author thanks two anonymous reviewers for their helpful comments and valuable suggestions on a earlier version of this article.


  1. Anderson SI, Housley AM, Jones PA, Slattery J, Miller JD (1993) Glasgow outcome scale: an inter-rater reliability study. Brain Inj 7:309–317CrossRefGoogle Scholar
  2. Berry KJ, Johnston JE, Mielke PW (2008) Weighted kappa for multiple raters. Percept Mot Skill 107: 837–848Google Scholar
  3. Brenner H, Kliebsch U (1996) Dependence of weighted kappa coefficients on the number of categories. Epidemiology 7:199–202CrossRefGoogle Scholar
  4. Cicchetti DV (1976) Assessing inter-rater reliability for rating scales: resolving some basic issues. Bri J Psychiatry 129:452–456CrossRefGoogle Scholar
  5. Cicchetti D, Allison T (1971) A new procedure for assessing reliability of scoring EEG sleep recordings. Am J EEG Technol 11:101–109Google Scholar
  6. Cohen J (1960) A coefficient of agreement for nominal scales. Educat Psychol Measur 20:37–46CrossRefGoogle Scholar
  7. Cohen J (1968) Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit. Psychol Bull 70:213–220CrossRefGoogle Scholar
  8. Fleiss JL, Cohen J (1973) The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Edu Psychol Measur 33:613–619CrossRefGoogle Scholar
  9. Fleiss JL, Cohen J, Everitt BS (1969) Large sample standard errors of kappa and weighted kappa. Psychol Bull 72:323–327CrossRefGoogle Scholar
  10. Graham P, Jackson R (1993) The analysis of ordinal agreement data: beyond weighted kappa. J Clin Epidemiol 46:1055–1062CrossRefGoogle Scholar
  11. Hsu LM, Field R (2003) Interrater agreement measures: comments on \(\kappa _{n}\), Cohen’s kappa, Scott’s \(\pi \) and Aickin’s \(\alpha \). Underst Stat 2:205–219CrossRefGoogle Scholar
  12. Jakobsson U, Westergren A (2005) Statistical methods for assessing agreement for ordinal data. Scand J Caring Sci 19:427–431CrossRefGoogle Scholar
  13. Kraemer HC, Periyakoil VS, Noda A (2002) Kappa coefficients in medical research. Stat Med 21:2109–2129CrossRefGoogle Scholar
  14. Kundel HL, Polansky M (2003) Measurement of observer agreement. Radiology 288:303–308CrossRefGoogle Scholar
  15. Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33:159–174MathSciNetzbMATHCrossRefGoogle Scholar
  16. Maclure M, Willett WC (1987) Misinterpretation and misuse of the kappa statistic. J Epidemiol 126:161–169CrossRefGoogle Scholar
  17. Mielke PW, Berry KJ (2009) A note on Cohen’s weighted kappa coefficient of agreement with linear weights. Stat Methodol 6:439–446MathSciNetCrossRefGoogle Scholar
  18. Schouten HJA (1986) Nominal scale agreement among observers. Psychometrika 51:453–466MathSciNetCrossRefGoogle Scholar
  19. Schuster C (2004) A note on the interpretation of weighted kappa and its relations to other rater agreement statistics for metric scales. Edu Psychol Measur 64:243–253MathSciNetCrossRefGoogle Scholar
  20. Seddon JM, Sahagian CR, Glynn RJ, Sperduto RD, Gragoudas ES, The Eye Disorders Case-Control Study Group (1990) Evaluation of an iris color classification system. Invest Ophthalmol Vis Sci 31:1592–1598Google Scholar
  21. Vanbelle S, Albert A (2009) A note on the linearly weighted kappa coefficient for ordinal scales. Stat Methodol 6:157–163MathSciNetzbMATHCrossRefGoogle Scholar
  22. Warrens MJ (2010) Cohen’s kappa can always be increased and decreased by combining categories. Stat Methodol 7:673–677MathSciNetzbMATHCrossRefGoogle Scholar
  23. Warrens MJ (2011) Cohen’s linearly weighted kappa is a weighted average of \(2\times 2\) kappas. Psychometrika 76:471–486MathSciNetzbMATHCrossRefGoogle Scholar
  24. Warrens MJ (2012a) Some paradoxical results for the quadratically weighted kappa. Psychometrika 77: 315–323Google Scholar
  25. Warrens MJ (2012b) Cohen’s linearly weighted kappa is a weighted average. Adv Data Anal Class 6:67–79Google Scholar
  26. Warrens MJ (2012c) Equivalences of weighted kappas for multiple raters. Stat Method 9:407–422Google Scholar
  27. Westlund KB, Kurland LT (1953) Studies on multiple sclerosis in Winnipeg, Manitoba and New Orleans, Louisiana. Am J Hyg 57:380–396Google Scholar
  28. Wilson JTL, Pettigrew LEL, Teasdale GM (1998) Structured interviews for the Glasgow outcome scale and the extended Glasgow outcome scale: guidelines for their use. J Neurotrauma 15:573–585CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  1. 1.Institute of Psychology, Unit Methodology and StatisticsLeiden UniversityLeidenThe Netherlands

Personalised recommendations