Reliability and agreement studies are of paramount importance. They do contribute to the quality of studies by providing information about the amount of error inherent to any diagnosis, score or measurement. Guidelines for reporting reliability and agreement studies were recently provided. While the use of the kappa-like family is advised for categorical and ordinal scales, no further guideline in the choice of a weighting scheme is given. In the present paper, a new simple and practical interpretation of the linear- and quadratic-weighted kappa coefficients is given. This will help researchers in motivating their choice of a weighting scheme.
This is a preview of subscription content, log in to check access.
Buy single article
Instant access to the full article PDF.
Price includes VAT for USA
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
This is the net price. Taxes to be calculated in checkout.
Brenner, H., & Kliebsch, U. (1996). Dependence of weighed kappa coefficients on the number of categories. Epidemiology, 7, 199–202.
Byrt, T., Bishop, J., & Carlin, J. B. (1993). Bias, prevalence and kappa. Journal of Clinical Epidemiology, 46, 423–429.
Cicchetti, D., & Allison, T. (1971). A new procedure for assessing reliability of scoring eeg sleep recordings. American Journal EEG Technology, 11, 101–109.
Cicchetti, D. V., & Feinstein, A. R. (1990). High agreement but low kappa: II. Resolving the paradoxes. Journal of Clinical Epidemiology, 43, 551–558.
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 37–46.
Cohen, J. (1968). Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin, 70, 213–220.
Feinstein, A. R., & Cicchetti, D. V. (1990). High agreement but low kappa: I. The problem of two paradoxes. Journal of Clinical Epidemiology, 43, 543–549.
Fleiss, J. L., & Cohen, J. (1973). The equivalence of weighted kappa and the intraclass correlation coefficient as measure of reliability. Educational and Psychological Measurement, 33, 613–619.
Kottner, J., Audigé, L., Brorson, S., Donner, A., Gajewski, B., Hröbjartsson, A., et al. (2011). Guidelines for reporting reliability and agreement studies (grras) were proposed. Journal of Clinical Epidemiology, 64, 96–106.
Kraemer, H. C. (1979). Ramifications of a population model for \(\kappa \) as a coefficient of reliability. Psychometrika, 44, 461–472.
Kraemer, H. C., Vyjeyanthi, S. P., & Noda, A. (2004). Dynamic ambient paradigms. In R. B. D’Agostino (Ed.), Tutorial in Biostatistics (Vol. 1, pp. 85–105). New York: Wiley.
Lipsitz, S. R. (1992). Methods for estimating the parameters of a linear model for ordered categorical data. Biometrics, 48, 271–281.
McGraw, K. O., & Wong, S. P. (1996). Forming inferences about some intraclass correlation coefficients. Psychological Methods, 1, 30–46.
Pearson, K. (1900). Mathematical contributions to the theory of evolution. VII. On the correlation of characters not quantitatively measurable. Philosophical Transactions of the Royal Society of London, Series A, 195, 1–45.
Rogot, E., & Goldberg, I. D. (1966). A proposed index for measuring agreement in test–retest studies. Journal of Chronic Diseases, 19, 991–1006.
Schuster, C. (2004). A note on the interpretation of weighted kappa and its relation to other rater agreement statistics for metric scales. Educational and Psychological Measurement, 64, 243–253.
Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: Uses in assessing rater reliability. Psychological Bulletin, 86, 420–428.
Stine, W. (1989). Interobserver relational agreement. Psychological Bulletin, 106, 341–347.
Vach, W. (2005). The dependence of Cohen’s kappa on the prevalence does not matter. Journal of Clinical Epidemiology, 58, 655–661.
Vanbelle, S. (2013). Clinical agreement in qualitative measurements: The kappa coefficient in clinical research. In S. Doi & G. Williams (Eds.), Methods of clinical epidemiology, Springer series on epidemiology and public health (pp. 3–38). Heidelberg: Springer.
Vanbelle, S., & Albert, A. (2009). A note on the linearly weighted kappa coefficient for ordinal scales. Statistical Methodology, 6, 157–163.
Warrens, M. (2013a). Conditional inequalities between cohen’s kappa and weighted kappas. Statistical Methodology, 10, 14–22.
Warrens, M. (2014). Corrected zegers-ten berge coefficients are special cases of Cohen’s weighted kappa. Journal of Classification, 31, 179–193.
Warrens, M. J. (2011). Cohen’s linearly weighted kappa is a weighted average of \(2\times 2\) kappas. Psychometrika, 76, 471–486.
Warrens, M. J. (2012). Cohen’s quadratically weighted kappa is higher than linearly weighted kappa for tridiagonal agreement tables. Statistical Methodology, 9, 440–444.
Warrens, M. J. (2013b). The Cicchetti\(-\)Allison weighting matrix is positive definite. Computational Statistics & Data Analysis, 59, 180–182.
Warrens, M. J. (2013c). Some paradoxical results for the quadratically weighted kappa. Psychometrika, 77, 315–323.
Warrens, M. J. (2013). Weighted kappas for \(3\times 3\) tables. Journal of Probability and Statistics.
Yang, J., & Chinchilli, V. M. (2009). Fixed-effects modeling of Cohen’s kappa for bivariate multinomial data. Communications in Statistics: Theory and Methods, 38, 3634–3653.
Yang, J., & Chinchilli, V. M. (2011). Fixed-effects modeling of Cohen’s weighted kappa for bivariate multinomial data. Computational Statistics & Data Analysis, 55, 1061–1070.
This research is part of project 451-13-002 funded by the Netherlands Organisation for Scientific Research. The author thanks three anonymous reviewers and the associate editor for their helpful comments and valuable suggestions on a earlier version of this article.
About this article
Cite this article
Vanbelle, S. A New Interpretation of the Weighted Kappa Coefficients. Psychometrika 81, 399–410 (2016). https://doi.org/10.1007/s11336-014-9439-4
- ordinal scale