Skip to main content

Beyond Accuracy, F-Score and ROC: A Family of Discriminant Measures for Performance Evaluation

  • Conference paper
AI 2006: Advances in Artificial Intelligence (AI 2006)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4304))

Included in the following conference series:

Abstract

Different evaluation measures assess different characteristics of machine learning algorithms. The empirical evaluation of algorithms and classifiers is a matter of on-going debate among researchers. Most measures in use today focus on a classifier’s ability to identify classes correctly. We note other useful properties, such as failure avoidance or class discrimination, and we suggest measures to evaluate such properties. These measures – Youden’s index, likelihood, Discriminant power – are used in medical diagnosis. We show that they are interrelated, and we apply them to a case study from the field of electronic negotiations. We also list other learning problems which may benefit from the application of these measures.

We did this work while the first author was at the University of Ottawa. Partial support came from the Natural Sciences and Engineering Research Council of Canada.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Demsar, J.: Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7, 1–30 (2006)

    MathSciNet  Google Scholar 

  2. Chawla, N., Japkowicz, N., Kolcz, A. (eds.): Special Issue on Learning from Imbalanced Data Sets. ACM SIGKDD Explorations, vol. 6(1) (2004)

    Google Scholar 

  3. Isselbacher, K., Braunwald, E.: Harrison’s Principles of Internal Medicine. McGraw-Hill, New York (1994)

    Google Scholar 

  4. Cohen, J.: Statistical Power Analysis for the Behavioral Sciences. Lawrence Erlbaum Associates, Hillsdale (1988)

    MATH  Google Scholar 

  5. Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? sentiment classification using machine learning techniques. In: Proc. Empirical Methods of Natural Language Processing EMNLP 2002, pp. 79–86 (2002)

    Google Scholar 

  6. Sokolova, M., Nastase, V., Shah, M., Szpakowicz, S.: Feature selection for electronic negotiation texts. In: Proc. Recent Advances in Natural Language Processing RANLP 2005, pp. 518–524 (2005)

    Google Scholar 

  7. Kersten, G., et al.: Electronic negotiations, media and transactions for socio-economic interactions (2006) (2002-2006), http://interneg.org/enegotiation/

  8. Witten, I., Frank, E.: Data Mining. Morgan Kaufmann, San Francisco (2005)

    MATH  Google Scholar 

  9. Cherkassky, V., Muller, F.: Learning from Data. Wiley, Chichester (1998)

    MATH  Google Scholar 

  10. Duda, R., Hart, P., Stork, D.: Pattern Classification. Wiley, Chichester (2000)

    Google Scholar 

  11. Youden, W.: Index for rating diagnostic tests. Cancer 3, 32–35 (1950)

    Article  Google Scholar 

  12. Biggerstaff, B.: Comparing diagnostic tests: a simple graphic using likelihood ratios. Statistics in Medicine 19(5), 649–663 (2000)

    Article  Google Scholar 

  13. Blakeley, D., Oddone, E.: Noninvasive carotid artery testing. Ann. Intern. Med. 122, 360–367 (1995)

    Google Scholar 

  14. Mishne, G.: Experiments with mood classification in blog posts. In: Proc. 1st Workshop on Stylistic Analysis of Text for Information Access (Style 2005) (2005), staff.science.uva.nl/gilad/pubs/style2005-blogmoods.pdf

  15. Hu, M., Liu, B.: Mining and summarizing customer reviews. In: Proc. 10th ACM SIGKDD International Conf. on Knowledge Discovery and Data Mining KDD 2004, pp. 168–177 (2004)

    Google Scholar 

  16. Boparai, J., Kay, J.: Supporting user task based conversations via email. In: Proc. 7th Australasian Document Computing Symposium (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sokolova, M., Japkowicz, N., Szpakowicz, S. (2006). Beyond Accuracy, F-Score and ROC: A Family of Discriminant Measures for Performance Evaluation. In: Sattar, A., Kang, Bh. (eds) AI 2006: Advances in Artificial Intelligence. AI 2006. Lecture Notes in Computer Science(), vol 4304. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11941439_114

Download citation

  • DOI: https://doi.org/10.1007/11941439_114

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-49787-5

  • Online ISBN: 978-3-540-49788-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics