Advertisement

Advances in Data Analysis and Classification

, Volume 4, Issue 4, pp 271–286 | Cite as

Inequalities between multi-rater kappas

  • Matthijs J. Warrens
Open Access
Regular Article

Abstract

The paper presents inequalities between four descriptive statistics that have been used to measure the nominal agreement between two or more raters. Each of the four statistics is a function of the pairwise information. Light’s kappa and Hubert’s kappa are multi-rater versions of Cohen’s kappa. Fleiss’ kappa is a multi-rater extension of Scott’s pi, whereas Randolph’s kappa generalizes Bennett et al. S to multiple raters. While a consistent ordering between the numerical values of these agreement measures has frequently been observed in practice, there is thus far no theoretical proof of a general ordering inequality among these measures. It is proved that Fleiss’ kappa is a lower bound of Hubert’s kappa and Randolph’s kappa, and that Randolph’s kappa is an upper bound of Hubert’s kappa and Light’s kappa if all pairwise agreement tables are weakly marginal symmetric or if all raters assign a certain minimum proportion of the objects to a specified category.

Keywords

Nominal agreement Cohen’s kappa Scott’s pi Light’s kappa Hubert’s kappa Fleiss’ kappa Randolph’s kappa Cauchy–Schwarz inequality Arithmetic-harmonic means inequality 

Mathematics Subject Classification (2010)

62H17 62H20 62P25 

Notes

Acknowledgments

The author thanks three anonymous reviewers for their helpful comments and valuable suggestions on earlier versions of this paper.

Open Access

This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

References

  1. Artstein R, Poesio M (2005) Kappa3 = Alpha (or Beta). NLE Technical Note 05-1, University of EssexGoogle Scholar
  2. Banerjee M, Capozzoli M, McSweeney L, Sinha D (1999) Beyond kappa: a review of interrater agreement measures. Can J Stat 27: 3–23zbMATHCrossRefMathSciNetGoogle Scholar
  3. Bennett EM, Alpert R, Goldstein AC (1954) Communications through limited response questioning. Public Opin Q 18: 303–308CrossRefGoogle Scholar
  4. Berry KJ, Mielke PW (1988) A generalization of Cohen’s kappa agreement measure to interval measurement and multiple raters. Educ Psychol Meas 48: 921–933CrossRefGoogle Scholar
  5. Brennan RL, Prediger DJ (1981) Coefficient kappa: some uses, misuses, and alternatives. Edu Psychol Meas 41: 687–699CrossRefGoogle Scholar
  6. Cohen J (1960) A coefficient of agreement for nominal scales. Edu Psychol Meas 20: 37–46CrossRefGoogle Scholar
  7. Cohen J (1968) Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit. Psychol Bull 70: 213–220CrossRefGoogle Scholar
  8. Conger AJ (1980) Integration and generalization of kappas for multiple raters. Psychol Bull 88: 322–328CrossRefGoogle Scholar
  9. Craig RT (1981) Generalization of Scott’s index of intercoder agreement. Public Opin Q 45: 260–264CrossRefGoogle Scholar
  10. Davies M, Fleiss JL (1982) Measuring agreement for multinomial data. Biometrics 38: 1047–1051zbMATHCrossRefGoogle Scholar
  11. De Mast J (2007) Agreement and kappa-type indices. Am Stat 61: 148–153CrossRefMathSciNetGoogle Scholar
  12. Di Eugenio B, Glass M (2004) The kappa statistic: a second look. Comput Linguist 30: 95–101CrossRefGoogle Scholar
  13. Dou W, Ren Y, Wu Q, Ruan S, Chen Y, Bloyet D, Constans J-M (2007) Fuzzy kappa for the agreement measure of fuzzy classifications. Neurocomputing 70: 726–734Google Scholar
  14. Fleiss JL (1971) Measuring nominal scale agreement among many raters. Psychol Bull 76: 378–382CrossRefGoogle Scholar
  15. Gwet KL (2008) Variance estimation of nominal-scale inter-rater reliability with random selection of raters. Psychometrika 73: 407–430CrossRefMathSciNetGoogle Scholar
  16. Heuvelmans APJM, Sanders PF (1993) Beoordelaarsovereenstemming. In: Eggen TJHM, Sanders PF (eds) Psychometrie in de Praktijk. Cito Instituut voor Toestontwikkeling, Arnhem, pp 443–470Google Scholar
  17. Hsu LM, Field R (2003) Interrater agreement measures: comments on kappa n, Cohen’s kappa, Scott’s π and Aickin’s α. Underst Stat 2: 205–219CrossRefGoogle Scholar
  18. Hubert L (1977) Kappa revisited. Psychol Bull 84: 289–297CrossRefGoogle Scholar
  19. Janes CL (1979) An extension of the random error coefficient of agreement to N × N tables. Br J Psychiatry 134: 617–619CrossRefGoogle Scholar
  20. Janson H, Olsson U (2001) A measure of agreement for interval or nominal multivariate observations. Educ Psychol Meas 61: 277–289CrossRefMathSciNetGoogle Scholar
  21. Janson S, Vegelius J (1979) On generalizations of the G index and the Phi coefficient to nominal scales. Multivar Behav Res 14: 255–269CrossRefGoogle Scholar
  22. Kraemer HC (1979) Ramifications of a population model for κ as a coefficient of reliability. Psychometrika 44: 461–472zbMATHCrossRefMathSciNetGoogle Scholar
  23. Kraemer HC (1980) Extensions of the kappa coefficient. Biometrics 36: 207–216zbMATHCrossRefGoogle Scholar
  24. Kraemer HC, Periyakoil VS, Noda A (2002) Tutorial in biostatistics: kappa coefficients in medical research. Stat Med 21: 2109–2129CrossRefGoogle Scholar
  25. Krippendorff K (1987) Association, agreement, and equity. Qual Quant 21: 109–123CrossRefGoogle Scholar
  26. Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33: 159–174zbMATHCrossRefMathSciNetGoogle Scholar
  27. Light RJ (1971) Measures of response agreement for qualitative data: some generalizations and alternatives. Psychol Bull 76: 365–377CrossRefGoogle Scholar
  28. Mitrinović DS (1964) Elementary inequalities. P. Noordhoff, GroningenzbMATHGoogle Scholar
  29. O’Malley FP, Mohsin SK, Badve S, Bose S, Collins LC, Ennis M, Kleer CG, Pinder SE, Schnitt SJ (2006) Interobserver reproducibility in the diagnosis of flat epithelial atypia of the breast. Mod Pathol 19: 172–179CrossRefGoogle Scholar
  30. Popping R (1983) Overeenstemmingsmaten voor nominale data. PhD thesis, Rijksuniversiteit Groningen, GroningenGoogle Scholar
  31. Randolph JJ (2005) Free-marginal multirater kappa (multirater κ free): an alternative to Fleiss’ fixed-Marginal multirater kappa. Paper presented at the Joensuu Learning and Instruction Symposium, Joensuu, FinlandGoogle Scholar
  32. Schouten HJA (1980) Measuring agreement among many observers. Biom J 22: 497–504zbMATHCrossRefMathSciNetGoogle Scholar
  33. Schouten HJA (1982) Measuring pairwise agreement among many observers. Biom J 24: 431–435zbMATHCrossRefMathSciNetGoogle Scholar
  34. Schouten HJA (1986) Nominal scale agreement among observers. Psychometrika 51: 453–466CrossRefMathSciNetGoogle Scholar
  35. Scott WA (1955) Reliability of content analysis: the case of nominal scale coding. Public Opin Q 19: 321–325CrossRefGoogle Scholar
  36. Vanbelle S, Albert A (2009) A note on the linearly weighted kappa coefficient for ordinal scales. Stat Methodol 6: 157–163CrossRefGoogle Scholar
  37. Warrens MJ (2008a) On similarity coefficients for 2 × 2 tables and correction for chance. Psychometrika 73: 487–502CrossRefMathSciNetGoogle Scholar
  38. Warrens MJ (2008b) Bounds of resemblance measures for binary (presence/absence) variables. J Classif 25: 195–208zbMATHCrossRefMathSciNetGoogle Scholar
  39. Warrens MJ (2008c) On association coefficients for 2 × 2 tables and properties that do not depend on the marginal distributions. Psychometrika 73: 777–789zbMATHCrossRefMathSciNetGoogle Scholar
  40. Warrens MJ (2008d) On the equivalence of Cohen’s kappa and the Hubert-Arabie adjusted Rand index. J Classif 25: 177–183zbMATHCrossRefGoogle Scholar
  41. Warrens MJ (2008e) On the indeterminacy of resemblance measures for (presence/absence) data. J Classif 25: 125–136zbMATHCrossRefMathSciNetGoogle Scholar
  42. Warrens MJ (2010a) Inequalities between kappa and kappa-like statistics for k × k tables. Psychometrika 75: 176–185zbMATHCrossRefGoogle Scholar
  43. Warrens MJ (2010b) A formal proof of a paradox associated with Cohen’s kappa. J Classif (in press)Google Scholar
  44. Warrens MJ (2010c) Cohen’s kappa can always be increased and decreased by combining categories. Stat Methodol 7: 673–677CrossRefGoogle Scholar
  45. Warrens MJ (2010d) A Kraemer-type rescaling that transforms the odds ratio into the weighted kappa coefficient. Psychometrika 75: 328–330zbMATHCrossRefGoogle Scholar
  46. Zwick R (1988) Another look at interrater agreement. Psychol Bull 103: 374–378CrossRefMathSciNetGoogle Scholar

Copyright information

© The Author(s) 2010

Authors and Affiliations

  1. 1.Unit Methodology and Statistics, Institute of PsychologyLeiden UniversityLeidenThe Netherlands

Personalised recommendations