On the Noise Resilience of Ranking Measures

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9948)


Performance measures play a pivotal role in the evaluation and selection of machine learning models for a wide range of applications. Using both synthetic and real-world data sets, we investigated the resilience to noise of various ranking measures. Our experiments revealed that the area under the ROC curve (AUC) and a related measure, the truncated average Kolmogorov-Smirnov statistic (taKS), can reliably discriminate between models with truly different performance under various types and levels of noise. With increasing class skew, however, the H-measure and estimators of the area under the precision-recall curve become preferable measures. Because of its simple graphical interpretation and robustness, the lower trapezoid estimator of the area under the precision-recall curve is recommended for highly imbalanced data sets.


Ranking Classification Noise Robustness ROC curve AUC H-measure taKS Precision-recall curve 


  1. 1.
    Berrar, D.: An empirical evaluation of ranking measures with respect to robustness to noise. J. Artif. Intell. Res. 49, 241–267 (2014)MathSciNetMATHGoogle Scholar
  2. 2.
    Boyd, K., Eng, K.H., Page, C.D.: Area under the precision-recall curve: point estimates and confidence intervals. In: Blockeel, H., Kersting, K., Nijssen, S., Železný, F. (eds.) ECML PKDD 2013. LNCS (LNAI), vol. 8190, pp. 451–466. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-40994-3_29 CrossRefGoogle Scholar
  3. 3.
    Davis, J., Goadrich, M.: The relationship between precision-recall and ROC curves. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 233–240. ACM (2006)Google Scholar
  4. 4.
    Drummond, C.: Machine learning as an experimental science, revisited. In: Proceedings of the 21st National Conference on Artificial Intelligence: Workshop on Evaluation Methods for Machine Learning, pp. 1–5. AAAI Press (2006)Google Scholar
  5. 5.
    Fawcett, T.: ROC graphs: notes and practical considerations for researchers. Technical Report HPL-2003-4, HP Laboratories, pp. 1–38 (2004)Google Scholar
  6. 6.
    Ferri, C., Hernández-Orallo, J., Modroiu, R.: An experimental comparison of performance measures for classification. Pattern Recogn. Lett. 30, 27–38 (2009)CrossRefGoogle Scholar
  7. 7.
    Flach, P.: ROC analysis. In: Sammut, C., Webb, G. (eds.) Encyclopedia of Machine Learning, pp. 869–874. Springer, US (2010)Google Scholar
  8. 8.
    Hand, D.: Measuring classifier performance: a coherent alternative to the area under the ROC curve. Mach. Learn. 77, 103–123 (2009)CrossRefGoogle Scholar
  9. 9.
    Hand, D., Till, R.: A simple generalisation of the area under the ROC curve for multiple class classification problems. Mach. Learn. 45, 171–186 (2001)CrossRefMATHGoogle Scholar
  10. 10.
    Hernández-Orallo, J., Flach, P., Ferri, C.: A unified view of performance metrics: translating threshold choice into expected classification loss. J. Mach. Learn. Res. 13, 2813–2869 (2012)MathSciNetMATHGoogle Scholar
  11. 11.
    Lichman, M.: UCI Machine Learning Repository (2013). http://archive.ics.uci.edu/ml
  12. 12.
    Oentaryo, R., Lim, E.P., Finegold, M., Lo, D., Zhu, F., Phua, C., Cheu, E.Y., Yap, G.E., Sim, K., Nguyen, M.N., Perera, K., Neupane, B., Faisal, M., Aung, Z., Woon, W.L., Chen, W., Patel, D., Berrar, D.: Detecting click fraud in online advertising: a data mining approach. J. Mach. Learn. Res. 15(1), 99–140 (2014)MathSciNetGoogle Scholar
  13. 13.
    Parker, C.: On measuring the performance of binary classifiers. Knowl. Inf. Syst. 35, 131–152 (2013)CrossRefGoogle Scholar
  14. 14.
    Prati, R.C., Batista, G., Monard, M.C.: A survey on graphical methods for classification predictive performance evaluation. IEEE Trans. Knowl. Data Eng. 23(11), 1601–1618 (2011)CrossRefGoogle Scholar
  15. 15.
    Provost, F., Fawcett, T.: Robust classification for imprecise environments. Mach. Learn. 42(3), 203–231 (2001)CrossRefMATHGoogle Scholar
  16. 16.
    R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna (2015). https://www.R-project.org/

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  1. 1.School of Arts and Sciences, College of EngineeringShibaura Institute of TechnologyMinuma-kuJapan

Personalised recommendations