Abstract
In the 1960s, the kappa statistic was introduced for the estimation of chance agreement in inter- and intra-rater reliability studies. The kappa statistic was strongly pushed by the medical field where it could be successfully applied via analyzing diagnoses of identical patient groups. Kappa is well suited for classification tasks where ranking is not considered. The main advantage of kappa is its simplicity and the general applicability to multi-class problems which is the major difference to receiver operating characteristic area under the curve. In this manuscript, I will outline the usage of kappa for classification tasks, and I will evaluate the role and uses of kappa in specifically machine learning and cheminformatics.
Similar content being viewed by others
References
Cohen J (1960) Edu Psychol Meas 20:37–46
Ben-David A (2008) Expert Syst Appl 34:825–832
Fleiss JL, Cohen J, Everitt BS (1969) Psychol Bull 72:323–327
Landis JR, Koch GG (1977) Biometrics 33:159–174
Fleiss JL (1981) Statistical methods for rates and proportions, (2nd ed.)Wiley: New York
Feinstein AR, Cicchetti DV (1990) J Clin Epidemiol 43:543–549
Byrt T, Bishop J, Carlin JB (1993) J Clin Epidemiol 46:423–429
Lantz CA, Nebenzahl E (1996) J Clin Epidemiol 49:431–434
Hoehler FK (2000) J Clin Epidemiol 53:499–503
pystatsmodels https://github.com/yarikoptic/pystatsmodels (accessed Dec 8, 2013)
Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, I. H. W. SIGKDD Explor. 2009, 11
irr R package http://cran.r-project.org/web/packages/irr/index.html (accessed Dec 8, 2013)
PresenceAbsence R package http://cran.r-project.org/web/packages/PresenceAbsence/index.html (accessed Dec 8, 2013)
Pedregosa F, Weiss R, Brucher M (2011) J Mach Learn Res 12:2825–2830
Czodrowski P (2013) J Chem Inf Model 53:2240–2251
Acknowledgments
I thank Christian Kramer (University of Innsbruck, Austria) for critical proof-reading, making useful suggestions and the discussions initiated by this manuscript and my GRC talk. Furthermore, the fantastic assistance by Kim Branson (Hessian Informatics, San Francisco, USA) is acknowledged. Without Kim, this paper and my GRC talk would have been less instructive. I would also like to thank Georgia McGaughey (Vertex Pharmaceuticals, Boston, USA) for her intense proof-reading. Lastly, I would like to express my deepest gratitude to Anthony Nicholls (OpenEye Scientific Software, Santa Fe, USA) who reviewed the initial GRC contribution and this manuscript in great detail: this was really a heroic effort!.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Czodrowski, P. Count on kappa. J Comput Aided Mol Des 28, 1049–1055 (2014). https://doi.org/10.1007/s10822-014-9759-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10822-014-9759-6