, Volume 76, Issue 3, pp 471–486 | Cite as

Cohen’s Linearly Weighted Kappa is a Weighted Average of 2×2 Kappas

  • Matthijs J. Warrens


An agreement table with n∈ℕ≥3 ordered categories can be collapsed into n−1 distinct 2×2 tables by combining adjacent categories. Vanbelle and Albert (Stat. Methodol. 6:157–163, 2009c) showed that the components of Cohen’s weighted kappa with linear weights can be obtained from these n−1 collapsed 2×2 tables. In this paper we consider several consequences of this result. One is that the weighted kappa with linear weights can be interpreted as a weighted arithmetic mean of the kappas corresponding to the 2×2 tables, where the weights are the denominators of the 2×2 kappas. In addition, it is shown that similar results and interpretations hold for linearly weighted kappas for multiple raters.


Cohen’s kappa merging categories linear weights quadratic weights Mielke, Berry and Johnston’s weighted kappa Hubert’s weighted kappa 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Agresti, A. (1990). Categorical data analysis. New York: Wiley. Google Scholar
  2. Artstein, R., & Poesio, M. (2005). NLE technical note: Vol. 05-1. Kappa 3 = alpha (or beta). Colchester: University of Essex. Google Scholar
  3. Berry, K.J., & Mielke, P.W. (1988). A generalization of Cohen’s kappa agreement measure to interval measurement and multiple raters. Educational and Psychological Measurement, 48, 921–933. CrossRefGoogle Scholar
  4. Brennan, R.L., & Prediger, D.J. (1981). Coefficient kappa: some uses, misuses, and alternatives. Educational and Psychological Measurement, 41, 687–699. CrossRefGoogle Scholar
  5. Brenner, H., & Kliebsch, U. (1996). Dependence of weighted kappa coefficients on the number of categories. Epidemiology, 7, 199–202. PubMedCrossRefGoogle Scholar
  6. Cicchetti, D., & Allison, T. (1971). A new procedure for assessing reliability of scoring EEG sleep recordings. The American Journal of EEG Technology, 11, 101–109. Google Scholar
  7. Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 213–220. CrossRefGoogle Scholar
  8. Cohen, J. (1968). Weighted kappa: Nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin, 70, 213–220. PubMedCrossRefGoogle Scholar
  9. Conger, A.J. (1980). Integration and generalization of kappas for multiple raters. Psychological Bulletin, 88, 322–328. CrossRefGoogle Scholar
  10. Davies, M., & Fleiss, J.L. (1982). Measuring agreement for multinomial data. Biometrics, 38, 1047–1051. CrossRefGoogle Scholar
  11. Fleiss, J.L. (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin, 76, 378–382. CrossRefGoogle Scholar
  12. Fleiss, J.L. (1981). Statistical methods for rates and proportions. New York: Wiley. Google Scholar
  13. Fleiss, J.L., & Cohen, J. (1973). The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educational and Psychological Measurement, 33, 613–619. CrossRefGoogle Scholar
  14. Fleiss, J.L., Cohen, J., & Everitt, B.S. (1969). Large sample standard errors of kappa and weighted kappa. Psychological Bulletin, 72, 323–327. CrossRefGoogle Scholar
  15. Heuvelmans, A.P.J.M., & Sanders, P.F. (1993). Beoordelaarsovereenstemming. In Eggen, T.J.H.M., & Sanders, P.F. (Eds.) Psychometrie in de Praktijk (pp. 443–470). Arnhem: Cito Instituut voor Toestontwikkeling. Google Scholar
  16. Holmquist, N.S., McMahon, C.A., & Williams, E.O. (1968). Variability in classification of carcinoma in situ of the uterine cervix. Obstetrical & Gynecological Survey, 23, 580–585. CrossRefGoogle Scholar
  17. Hsu, L.M., & Field, R. (2003). Interrater agreement measures: comments on kappan, Cohen’s kappa, Scott’s π and Aickin’s α. Understanding Statistics, 2, 205–219. CrossRefGoogle Scholar
  18. Hubert, L. (1977). Kappa revisited. Psychological Bulletin, 84, 289–297. CrossRefGoogle Scholar
  19. Jakobsson, U., & Westergren, A. (2005). Statistical methods for assessing agreement for ordinal data. Scandinavian Journal of Caring Sciences, 19, 427–431. PubMedCrossRefGoogle Scholar
  20. Janson, H., & Olsson, U. (2001). A measure of agreement for interval or nominal multivariate observations. Educational and Psychological Measurement, 61, 277–289. CrossRefGoogle Scholar
  21. Kraemer, H.C. (1979). Ramifications of a population model for κ as a coefficient of reliability. Psychometrika, 44, 461–472. CrossRefGoogle Scholar
  22. Kraemer, H.C., Periyakoil, V.S., & Noda, A. (2004). Tutorial in biostatistics: kappa coefficients in medical research. Statistics in Medicine, 21, 2109–2129. CrossRefGoogle Scholar
  23. Krippendorff, K. (2004). Reliability in content analysis: some common misconceptions and recommendations. Human Communication Research, 30, 411–433. Google Scholar
  24. Kundel, H.L., & Polansky, M. (2003). Measurement of observer agreement. Radiology, 288, 303–308. CrossRefGoogle Scholar
  25. Landis, J.R., & Koch, G.G. (1977). An application of hierarchical kappa-type statistics in the assessment of majority agreement among multiple observers. Biometrics, 33, 363–374. PubMedCrossRefGoogle Scholar
  26. Mielke, P.W., & Berry, K.J. (2009). A note on Cohen’s weighted kappa coefficient of agreement with linear weights. Statistical Methodology, 6, 439–446. CrossRefGoogle Scholar
  27. Mielke, P.W., Berry, K.J., & Johnston, J.E. (2007). The exact variance of weighted kappa with multiple raters. Psychological Reports, 101, 655–660. PubMedGoogle Scholar
  28. Mielke, P.W., Berry, K.J., & Johnston, J.E. (2008). Resampling probability values for weighted kappa with multiple raters. Psychological Reports, 102, 606–613. PubMedCrossRefGoogle Scholar
  29. Nelson, J.C., & Pepe, M.S. (2000). Statistical description of interrater variability in ordinal ratings. Statistical Methods in Medical Research, 9, 475–496. PubMedCrossRefGoogle Scholar
  30. Popping, R. (1983). Overeenstemmingsmaten voor Nominale Data. Unpublished doctoral dissertation, Rijksuniversiteit Groningen, Groningen. Google Scholar
  31. Popping, R. (2010). Some views on agreement to be used in content analysis studies. Quality & Quantity, 44, 1067–1078. CrossRefGoogle Scholar
  32. Schouten, H.J.A. (1986). Nominal scale agreement among observers. Psychometrika, 51, 453–466. CrossRefGoogle Scholar
  33. Schuster, C. (2004). A note on the interpretation of weighted kappa and its relations to other rater agreement statistics for metric scales. Educational and Psychological Measurement, 64, 243–253. CrossRefGoogle Scholar
  34. Scott, W.A. (1955). Reliability of content analysis: the case of nominal scale coding. Public Opinion Quarterly, 19, 321–325. CrossRefGoogle Scholar
  35. Vanbelle, S., & Albert, A. (2009a). Agreement between two independent groups of raters. Psychometrika, 74, 477–491. CrossRefGoogle Scholar
  36. Vanbelle, S., & Albert, A. (2009b). Agreement between an isolated rater and a group of raters. Statistica Neerlandica, 63, 82–100. CrossRefGoogle Scholar
  37. Vanbelle, S., & Albert, A. (2009c). A note on the linearly weighted kappa coefficient for ordinal scales. Statistical Methodology, 6, 157–163. CrossRefGoogle Scholar
  38. Visser, H., & de Nijs, T. (2006). The map comparison kit. Environmental Modelling & Software, 21, 346–358. CrossRefGoogle Scholar
  39. Warrens, M.J. (2008a). On similarity coefficients for 2×2 tables and correction for chance. Psychometrika, 73, 487–502. PubMedCrossRefGoogle Scholar
  40. Warrens, M.J. (2008b). On the equivalence of Cohen’s kappa and the Hubert–Arabie adjusted Rand index. Journal of Classification, 25, 177–183. CrossRefGoogle Scholar
  41. Warrens, M.J. (2009). k-adic similarity coefficients for binary (presence/absence) data. Journal of Classification, 26, 227–245. CrossRefGoogle Scholar
  42. Warrens, M.J. (2010a). Inequalities between kappa and kappa-like statistics for k×k tables. Psychometrika, 75, 176–185. CrossRefGoogle Scholar
  43. Warrens, M.J. (2010b). Cohen’s kappa can always be increased and decreased by combining categories. Statistical Methodology, 7, 673–677. CrossRefGoogle Scholar
  44. Warrens, M.J. (2010c). A Kraemer-type rescaling that transforms the odds ratio into the weighted kappa coefficient. Psychometrika, 75, 328–330. CrossRefGoogle Scholar
  45. Warrens, M.J. (2010d). A formal proof of a paradox associated with Cohen’s kappa. Journal of Classification, 27, 322–332. CrossRefGoogle Scholar
  46. Warrens, M.J. (2010e). Inequalities between multi-rater kappas. Advances in Data Analysis and Classification, 4, 271–286. CrossRefGoogle Scholar
  47. Warrens, M.J. (2011). Weighted kappa is higher than Cohen’s kappa for tridiagonal agreement tables. Statistical Methodology, 4, 271–286. Google Scholar
  48. Zwick, R. (1988). Another look at interrater agreement. Psychological Bulletin, 103, 374–378. PubMedCrossRefGoogle Scholar

Copyright information

© The Psychometric Society 2011

Authors and Affiliations

  1. 1.Department of Methodology and StatisticsTilburg UniversityTilburgThe Netherlands

Personalised recommendations