Skip to main content

Advertisement

SpringerLink
Go to cart
  1. Home
  2. Advances in Data Analysis and Classification
  3. Article
Inequalities between multi-rater kappas
Download PDF
Your article has downloaded

Similar articles being viewed by others

Slider with three articles shown per slide. Use the Previous and Next buttons to navigate the slides or the slide controller buttons at the end to navigate through each slide.

Detection of grey zones in inter-rater agreement studies

05 January 2023

Haydar Demirhan & Ayfer Ezgi Yilmaz

Better to be in agreement than in bad company

16 September 2022

Paulo Sergio Panse Silveira & Jose Oliveira Siqueira

The Dependence of Chance-Corrected Weighted Agreement Coefficients on the Power Parameter of the Weighting Scheme: Analysis and Measurement

06 September 2022

Rutger van Oest

A measure of interrater absolute agreement for ordinal categorical data

24 November 2020

Giuseppe Bove, Pier Luigi Conti & Daniela Marella

Interrater reliability estimators tested against true interrater reliabilities

29 August 2022

Xinshu Zhao, Guangchao Charles Feng, … Piper Liping Liu

A Comparison of Reliability Coefficients for Ordinal Rating Scales

22 April 2021

Alexandra de Raadt, Matthijs J. Warrens, … Henk A. L. Kiers

Analysis of the Weighted Kappa and Its Maximum with Markov Moves

03 February 2022

Fabio Rapallo

Kappa coefficients for dichotomous-nominal classifications

07 April 2020

Matthijs J. Warrens

Homogeneity score test of AC1 statistics and estimation of common AC1 in multiple or stratified inter-rater agreement studies

05 February 2020

Chikara Honda & Tetsuji Ohyama

Download PDF
  • Regular Article
  • Open Access
  • Published: 10 October 2010

Inequalities between multi-rater kappas

  • Matthijs J. Warrens1 

Advances in Data Analysis and Classification volume 4, pages 271–286 (2010)Cite this article

  • 3411 Accesses

  • 140 Citations

  • Metrics details

Abstract

The paper presents inequalities between four descriptive statistics that have been used to measure the nominal agreement between two or more raters. Each of the four statistics is a function of the pairwise information. Light’s kappa and Hubert’s kappa are multi-rater versions of Cohen’s kappa. Fleiss’ kappa is a multi-rater extension of Scott’s pi, whereas Randolph’s kappa generalizes Bennett et al. S to multiple raters. While a consistent ordering between the numerical values of these agreement measures has frequently been observed in practice, there is thus far no theoretical proof of a general ordering inequality among these measures. It is proved that Fleiss’ kappa is a lower bound of Hubert’s kappa and Randolph’s kappa, and that Randolph’s kappa is an upper bound of Hubert’s kappa and Light’s kappa if all pairwise agreement tables are weakly marginal symmetric or if all raters assign a certain minimum proportion of the objects to a specified category.

Download to read the full article text

Working on a manuscript?

Avoid the common mistakes

References

  • Artstein R, Poesio M (2005) Kappa3 = Alpha (or Beta). NLE Technical Note 05-1, University of Essex

  • Banerjee M, Capozzoli M, McSweeney L, Sinha D (1999) Beyond kappa: a review of interrater agreement measures. Can J Stat 27: 3–23

    Article  MATH  MathSciNet  Google Scholar 

  • Bennett EM, Alpert R, Goldstein AC (1954) Communications through limited response questioning. Public Opin Q 18: 303–308

    Article  Google Scholar 

  • Berry KJ, Mielke PW (1988) A generalization of Cohen’s kappa agreement measure to interval measurement and multiple raters. Educ Psychol Meas 48: 921–933

    Article  Google Scholar 

  • Brennan RL, Prediger DJ (1981) Coefficient kappa: some uses, misuses, and alternatives. Edu Psychol Meas 41: 687–699

    Article  Google Scholar 

  • Cohen J (1960) A coefficient of agreement for nominal scales. Edu Psychol Meas 20: 37–46

    Article  Google Scholar 

  • Cohen J (1968) Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit. Psychol Bull 70: 213–220

    Article  Google Scholar 

  • Conger AJ (1980) Integration and generalization of kappas for multiple raters. Psychol Bull 88: 322–328

    Article  Google Scholar 

  • Craig RT (1981) Generalization of Scott’s index of intercoder agreement. Public Opin Q 45: 260–264

    Article  Google Scholar 

  • Davies M, Fleiss JL (1982) Measuring agreement for multinomial data. Biometrics 38: 1047–1051

    Article  MATH  Google Scholar 

  • De Mast J (2007) Agreement and kappa-type indices. Am Stat 61: 148–153

    Article  MathSciNet  Google Scholar 

  • Di Eugenio B, Glass M (2004) The kappa statistic: a second look. Comput Linguist 30: 95–101

    Article  Google Scholar 

  • Dou W, Ren Y, Wu Q, Ruan S, Chen Y, Bloyet D, Constans J-M (2007) Fuzzy kappa for the agreement measure of fuzzy classifications. Neurocomputing 70: 726–734

    Google Scholar 

  • Fleiss JL (1971) Measuring nominal scale agreement among many raters. Psychol Bull 76: 378–382

    Article  Google Scholar 

  • Gwet KL (2008) Variance estimation of nominal-scale inter-rater reliability with random selection of raters. Psychometrika 73: 407–430

    Article  MathSciNet  Google Scholar 

  • Heuvelmans APJM, Sanders PF (1993) Beoordelaarsovereenstemming. In: Eggen TJHM, Sanders PF (eds) Psychometrie in de Praktijk. Cito Instituut voor Toestontwikkeling, Arnhem, pp 443–470

  • Hsu LM, Field R (2003) Interrater agreement measures: comments on kappa n, Cohen’s kappa, Scott’s π and Aickin’s α. Underst Stat 2: 205–219

    Article  Google Scholar 

  • Hubert L (1977) Kappa revisited. Psychol Bull 84: 289–297

    Article  Google Scholar 

  • Janes CL (1979) An extension of the random error coefficient of agreement to N × N tables. Br J Psychiatry 134: 617–619

    Article  Google Scholar 

  • Janson H, Olsson U (2001) A measure of agreement for interval or nominal multivariate observations. Educ Psychol Meas 61: 277–289

    Article  MathSciNet  Google Scholar 

  • Janson S, Vegelius J (1979) On generalizations of the G index and the Phi coefficient to nominal scales. Multivar Behav Res 14: 255–269

    Article  Google Scholar 

  • Kraemer HC (1979) Ramifications of a population model for κ as a coefficient of reliability. Psychometrika 44: 461–472

    Article  MATH  MathSciNet  Google Scholar 

  • Kraemer HC (1980) Extensions of the kappa coefficient. Biometrics 36: 207–216

    Article  MATH  Google Scholar 

  • Kraemer HC, Periyakoil VS, Noda A (2002) Tutorial in biostatistics: kappa coefficients in medical research. Stat Med 21: 2109–2129

    Article  Google Scholar 

  • Krippendorff K (1987) Association, agreement, and equity. Qual Quant 21: 109–123

    Article  Google Scholar 

  • Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33: 159–174

    Article  MATH  MathSciNet  Google Scholar 

  • Light RJ (1971) Measures of response agreement for qualitative data: some generalizations and alternatives. Psychol Bull 76: 365–377

    Article  Google Scholar 

  • Mitrinović DS (1964) Elementary inequalities. P. Noordhoff, Groningen

    MATH  Google Scholar 

  • O’Malley FP, Mohsin SK, Badve S, Bose S, Collins LC, Ennis M, Kleer CG, Pinder SE, Schnitt SJ (2006) Interobserver reproducibility in the diagnosis of flat epithelial atypia of the breast. Mod Pathol 19: 172–179

    Article  Google Scholar 

  • Popping R (1983) Overeenstemmingsmaten voor nominale data. PhD thesis, Rijksuniversiteit Groningen, Groningen

  • Randolph JJ (2005) Free-marginal multirater kappa (multirater κ free): an alternative to Fleiss’ fixed-Marginal multirater kappa. Paper presented at the Joensuu Learning and Instruction Symposium, Joensuu, Finland

  • Schouten HJA (1980) Measuring agreement among many observers. Biom J 22: 497–504

    Article  MATH  MathSciNet  Google Scholar 

  • Schouten HJA (1982) Measuring pairwise agreement among many observers. Biom J 24: 431–435

    Article  MATH  MathSciNet  Google Scholar 

  • Schouten HJA (1986) Nominal scale agreement among observers. Psychometrika 51: 453–466

    Article  MathSciNet  Google Scholar 

  • Scott WA (1955) Reliability of content analysis: the case of nominal scale coding. Public Opin Q 19: 321–325

    Article  Google Scholar 

  • Vanbelle S, Albert A (2009) A note on the linearly weighted kappa coefficient for ordinal scales. Stat Methodol 6: 157–163

    Article  Google Scholar 

  • Warrens MJ (2008a) On similarity coefficients for 2 × 2 tables and correction for chance. Psychometrika 73: 487–502

    Article  MathSciNet  Google Scholar 

  • Warrens MJ (2008b) Bounds of resemblance measures for binary (presence/absence) variables. J Classif 25: 195–208

    Article  MATH  MathSciNet  Google Scholar 

  • Warrens MJ (2008c) On association coefficients for 2 × 2 tables and properties that do not depend on the marginal distributions. Psychometrika 73: 777–789

    Article  MATH  MathSciNet  Google Scholar 

  • Warrens MJ (2008d) On the equivalence of Cohen’s kappa and the Hubert-Arabie adjusted Rand index. J Classif 25: 177–183

    Article  MATH  Google Scholar 

  • Warrens MJ (2008e) On the indeterminacy of resemblance measures for (presence/absence) data. J Classif 25: 125–136

    Article  MATH  MathSciNet  Google Scholar 

  • Warrens MJ (2010a) Inequalities between kappa and kappa-like statistics for k × k tables. Psychometrika 75: 176–185

    Article  MATH  Google Scholar 

  • Warrens MJ (2010b) A formal proof of a paradox associated with Cohen’s kappa. J Classif (in press)

  • Warrens MJ (2010c) Cohen’s kappa can always be increased and decreased by combining categories. Stat Methodol 7: 673–677

    Article  Google Scholar 

  • Warrens MJ (2010d) A Kraemer-type rescaling that transforms the odds ratio into the weighted kappa coefficient. Psychometrika 75: 328–330

    Article  MATH  Google Scholar 

  • Zwick R (1988) Another look at interrater agreement. Psychol Bull 103: 374–378

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgments

The author thanks three anonymous reviewers for their helpful comments and valuable suggestions on earlier versions of this paper.

Open Access

This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Author information

Authors and Affiliations

  1. Unit Methodology and Statistics, Institute of Psychology, Leiden University, P.O. Box 9555, 2300 RB, Leiden, The Netherlands

    Matthijs J. Warrens

Authors
  1. Matthijs J. Warrens
    View author publications

    You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Matthijs J. Warrens.

Rights and permissions

Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Reprints and Permissions

About this article

Cite this article

Warrens, M.J. Inequalities between multi-rater kappas. Adv Data Anal Classif 4, 271–286 (2010). https://doi.org/10.1007/s11634-010-0073-4

Download citation

  • Received: 26 October 2009

  • Revised: 06 September 2010

  • Accepted: 23 September 2010

  • Published: 10 October 2010

  • Issue Date: December 2010

  • DOI: https://doi.org/10.1007/s11634-010-0073-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Keywords

  • Nominal agreement
  • Cohen’s kappa
  • Scott’s pi
  • Light’s kappa
  • Hubert’s kappa
  • Fleiss’ kappa
  • Randolph’s kappa
  • Cauchy–Schwarz inequality
  • Arithmetic-harmonic means inequality

Mathematics Subject Classification (2010)

  • 62H17
  • 62H20
  • 62P25
Download PDF

Working on a manuscript?

Avoid the common mistakes

Advertisement

Over 10 million scientific documents at your fingertips

Switch Edition
  • Academic Edition
  • Corporate Edition
  • Home
  • Impressum
  • Legal information
  • Privacy statement
  • Your US state privacy rights
  • How we use cookies
  • Your privacy choices/Manage cookies
  • Accessibility
  • FAQ
  • Contact us
  • Affiliate program

Not affiliated

Springer Nature

© 2023 Springer Nature Switzerland AG. Part of Springer Nature.