Skip to main content

Advertisement

Log in

Count on kappa

  • Special Series: Statistics in Molecular Modeling
  • Guest Editor: Anthony Nicholls
  • Published:
Journal of Computer-Aided Molecular Design Aims and scope Submit manuscript

Abstract

In the 1960s, the kappa statistic was introduced for the estimation of chance agreement in inter- and intra-rater reliability studies. The kappa statistic was strongly pushed by the medical field where it could be successfully applied via analyzing diagnoses of identical patient groups. Kappa is well suited for classification tasks where ranking is not considered. The main advantage of kappa is its simplicity and the general applicability to multi-class problems which is the major difference to receiver operating characteristic area under the curve. In this manuscript, I will outline the usage of kappa for classification tasks, and I will evaluate the role and uses of kappa in specifically machine learning and cheminformatics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Cohen J (1960) Edu Psychol Meas 20:37–46

    Article  Google Scholar 

  2. Ben-David A (2008) Expert Syst Appl 34:825–832

    Article  Google Scholar 

  3. Fleiss JL, Cohen J, Everitt BS (1969) Psychol Bull 72:323–327

    Article  Google Scholar 

  4. Landis JR, Koch GG (1977) Biometrics 33:159–174

    Article  CAS  Google Scholar 

  5. Fleiss JL (1981) Statistical methods for rates and proportions, (2nd ed.)Wiley: New York

  6. Feinstein AR, Cicchetti DV (1990) J Clin Epidemiol 43:543–549

    Article  CAS  Google Scholar 

  7. Byrt T, Bishop J, Carlin JB (1993) J Clin Epidemiol 46:423–429

    Article  CAS  Google Scholar 

  8. Lantz CA, Nebenzahl E (1996) J Clin Epidemiol 49:431–434

    Article  CAS  Google Scholar 

  9. Hoehler FK (2000) J Clin Epidemiol 53:499–503

    Article  CAS  Google Scholar 

  10. pystatsmodels https://github.com/yarikoptic/pystatsmodels (accessed Dec 8, 2013)

  11. Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, I. H. W. SIGKDD Explor. 2009, 11

  12. irr R package http://cran.r-project.org/web/packages/irr/index.html (accessed Dec 8, 2013)

  13. PresenceAbsence R package http://cran.r-project.org/web/packages/PresenceAbsence/index.html (accessed Dec 8, 2013)

  14. Pedregosa F, Weiss R, Brucher M (2011) J Mach Learn Res 12:2825–2830

    Google Scholar 

  15. Czodrowski P (2013) J Chem Inf Model 53:2240–2251

    Article  CAS  Google Scholar 

Download references

Acknowledgments

I thank Christian Kramer (University of Innsbruck, Austria) for critical proof-reading, making useful suggestions and the discussions initiated by this manuscript and my GRC talk. Furthermore, the fantastic assistance by Kim Branson (Hessian Informatics, San Francisco, USA) is acknowledged. Without Kim, this paper and my GRC talk would have been less instructive. I would also like to thank Georgia McGaughey (Vertex Pharmaceuticals, Boston, USA) for her intense proof-reading. Lastly, I would like to express my deepest gratitude to Anthony Nicholls (OpenEye Scientific Software, Santa Fe, USA) who reviewed the initial GRC contribution and this manuscript in great detail: this was really a heroic effort!.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Paul Czodrowski.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Czodrowski, P. Count on kappa. J Comput Aided Mol Des 28, 1049–1055 (2014). https://doi.org/10.1007/s10822-014-9759-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10822-014-9759-6

Keywords

Navigation