Skip to main content

On the Noise Resilience of Ranking Measures

Part of the Lecture Notes in Computer Science book series (LNTCS,volume 9948)

Abstract

Performance measures play a pivotal role in the evaluation and selection of machine learning models for a wide range of applications. Using both synthetic and real-world data sets, we investigated the resilience to noise of various ranking measures. Our experiments revealed that the area under the ROC curve (AUC) and a related measure, the truncated average Kolmogorov-Smirnov statistic (taKS), can reliably discriminate between models with truly different performance under various types and levels of noise. With increasing class skew, however, the H-measure and estimators of the area under the precision-recall curve become preferable measures. Because of its simple graphical interpretation and robustness, the lower trapezoid estimator of the area under the precision-recall curve is recommended for highly imbalanced data sets.

Keywords

  • Ranking
  • Classification
  • Noise
  • Robustness
  • ROC curve
  • AUC
  • H-measure
  • taKS
  • Precision-recall curve

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-46672-9_6
  • Chapter length: 9 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   89.00
Price excludes VAT (USA)
  • ISBN: 978-3-319-46672-9
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   119.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.

References

  1. Berrar, D.: An empirical evaluation of ranking measures with respect to robustness to noise. J. Artif. Intell. Res. 49, 241–267 (2014)

    MathSciNet  MATH  Google Scholar 

  2. Boyd, K., Eng, K.H., Page, C.D.: Area under the precision-recall curve: point estimates and confidence intervals. In: Blockeel, H., Kersting, K., Nijssen, S., Železný, F. (eds.) ECML PKDD 2013. LNCS (LNAI), vol. 8190, pp. 451–466. Springer, Heidelberg (2013). doi:10.1007/978-3-642-40994-3_29

    CrossRef  Google Scholar 

  3. Davis, J., Goadrich, M.: The relationship between precision-recall and ROC curves. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 233–240. ACM (2006)

    Google Scholar 

  4. Drummond, C.: Machine learning as an experimental science, revisited. In: Proceedings of the 21st National Conference on Artificial Intelligence: Workshop on Evaluation Methods for Machine Learning, pp. 1–5. AAAI Press (2006)

    Google Scholar 

  5. Fawcett, T.: ROC graphs: notes and practical considerations for researchers. Technical Report HPL-2003-4, HP Laboratories, pp. 1–38 (2004)

    Google Scholar 

  6. Ferri, C., Hernández-Orallo, J., Modroiu, R.: An experimental comparison of performance measures for classification. Pattern Recogn. Lett. 30, 27–38 (2009)

    CrossRef  Google Scholar 

  7. Flach, P.: ROC analysis. In: Sammut, C., Webb, G. (eds.) Encyclopedia of Machine Learning, pp. 869–874. Springer, US (2010)

    Google Scholar 

  8. Hand, D.: Measuring classifier performance: a coherent alternative to the area under the ROC curve. Mach. Learn. 77, 103–123 (2009)

    CrossRef  Google Scholar 

  9. Hand, D., Till, R.: A simple generalisation of the area under the ROC curve for multiple class classification problems. Mach. Learn. 45, 171–186 (2001)

    CrossRef  MATH  Google Scholar 

  10. Hernández-Orallo, J., Flach, P., Ferri, C.: A unified view of performance metrics: translating threshold choice into expected classification loss. J. Mach. Learn. Res. 13, 2813–2869 (2012)

    MathSciNet  MATH  Google Scholar 

  11. Lichman, M.: UCI Machine Learning Repository (2013). http://archive.ics.uci.edu/ml

  12. Oentaryo, R., Lim, E.P., Finegold, M., Lo, D., Zhu, F., Phua, C., Cheu, E.Y., Yap, G.E., Sim, K., Nguyen, M.N., Perera, K., Neupane, B., Faisal, M., Aung, Z., Woon, W.L., Chen, W., Patel, D., Berrar, D.: Detecting click fraud in online advertising: a data mining approach. J. Mach. Learn. Res. 15(1), 99–140 (2014)

    MathSciNet  Google Scholar 

  13. Parker, C.: On measuring the performance of binary classifiers. Knowl. Inf. Syst. 35, 131–152 (2013)

    CrossRef  Google Scholar 

  14. Prati, R.C., Batista, G., Monard, M.C.: A survey on graphical methods for classification predictive performance evaluation. IEEE Trans. Knowl. Data Eng. 23(11), 1601–1618 (2011)

    CrossRef  Google Scholar 

  15. Provost, F., Fawcett, T.: Robust classification for imprecise environments. Mach. Learn. 42(3), 203–231 (2001)

    CrossRef  MATH  Google Scholar 

  16. R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna (2015). https://www.R-project.org/

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniel Berrar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Berrar, D. (2016). On the Noise Resilience of Ranking Measures. In: Hirose, A., Ozawa, S., Doya, K., Ikeda, K., Lee, M., Liu, D. (eds) Neural Information Processing. ICONIP 2016. Lecture Notes in Computer Science(), vol 9948. Springer, Cham. https://doi.org/10.1007/978-3-319-46672-9_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-46672-9_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-46671-2

  • Online ISBN: 978-3-319-46672-9

  • eBook Packages: Computer ScienceComputer Science (R0)