Skip to main content

Fleiss’ kappa statistic without paradoxes


The Fleiss’ kappa statistic is a well-known index for assessing the reliability of agreement between raters. It is used both in the psychological and in the psychiatric field. Unfortunately, the kappa statistic may behave inconsistently in case of strong agreement between raters, since this index assumes lower values than it would have been expected. The aim of this paper is to propose a new method to avoid this paradox through permutation techniques. Furthermore, we study the problem of kappa confidence intervals and, in particular, we suggest to use Bootstrap confidence intervals free of paradoxes.

This is a preview of subscription content, access via your institution.


  • Cicchetti, D.V., Feinstein, A.R.: High agreement but low kappa: II. Resolving the paradoxes. J. Clin. Epidemiol. 43(6), 551–558 (1990). doi:10.1016/0895-4356(90)90158-L

    Article  Google Scholar 

  • Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 20(1), 37–46 (1960). doi:10.1177/001316446002000104

    Article  Google Scholar 

  • Dijkstra, L., van Eijnatten, F.M.: Agreement and consensus in a q-mode research design: an empirical comparison of measures, and an application. Qual. Quant. 43(5), 757–771 (2009). doi:10.1007/s11135-009-9249-4

    Article  Google Scholar 

  • Feinstein, A.R., Cicchetti, D.V.: High agreement but low kappa: I. the problems of two paradoxes. J. Clin. Epidemiol. 43(6), 543–549 (1990). doi:10.1016/0895-4356(90)90159-M

    Article  Google Scholar 

  • Fleiss, J.L.: Measuring nominal scale agreement among many raters. Psychol. Bull. 76(5), 378 (1971). doi:10.1037/h0031619

    Article  Google Scholar 

  • Fleiss, J.L., Levin, B., Paik, M.C.: Statistical methods for rates and proportions. Wiley, Hoboken (2003)

    Book  Google Scholar 

  • Gwet, K.L.: Computing inter-rater reliability and its variance in the presence of high agreement. Br. J. Math. Stat. Psychol. 61(1), 29–48 (2008). doi:10.1348/000711006X126600

    Article  Google Scholar 

  • Harvey, A.G., Tang, N.K.: (Mis)perception of sleep in insomnia: a puzzle and a resolution. Psychol. Bull. 138(1), 77 (2012). doi:10.1037/a0025730

    Article  Google Scholar 

  • Lantz, C.A., Nebenzahl, E.: Behavior and interpretation of the k statistic: resolution of the two paradoxes. J. Clin. Epidemiol. 49(4), 431–434 (1996). doi:10.1016/0895-4356(95)00571-4

    Article  Google Scholar 

  • Markon, K.E., Chmielewski, M., Miller, C.J.: The reliability and validity of discrete and continuous measures of psychopathology: a quantitative review. Psychol. Bull. 137(5), 856 (2011). doi:10.1037/a0023678

    Article  Google Scholar 

  • Mielke, P.J.W., Berry, K.J.: Permutation methods: a distance function approach. Springer, New York (2007)

    Google Scholar 

  • Östlin, P., Wärneryd, B., Thorslund, M.: Should occupational codes be obtained from census data or from retrospective survey data in studies on occupational health? Soc. Indic. Res. 23(3), 231–246 (1990)

    Article  Google Scholar 

  • Scott, W.A.: Reliability of content analysis: the case of nominal scale coding. Pub. Opin. Q. (1955). doi:10.1086/266577

  • Shao, J., Tu, D.: The jackknife and bootstrap. Springer, New York (1995)

    Book  Google Scholar 

  • Shoukri, M.M.: Measures of interobserver agreement and reliability. Chapman & Hall, Boca Raton (2004)

    Google Scholar 

  • Uttal, D.H., Meadow, N.G., Tipton, E., Hand, L.L., Alden, A.R., Warren, C., Newcombe, N.S.: The malleability of spatial skills: a meta-analysis of training studies. Psychol. Bull. 139(2), 352 (2013). doi:10.1037/a0028446

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Rosa Falotico.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Falotico, R., Quatto, P. Fleiss’ kappa statistic without paradoxes. Qual Quant 49, 463–470 (2015).

Download citation

  • Published:

  • Issue Date:

  • DOI:


  • Inter-rater agreement
  • Fleiss’ kappa
  • Kappa paradoxes
  • Monte Carlo simulations
  • Bootstrap confidence intervals