Skip to main content

On the Noise Resilience of Ranking Measures

Part of the Lecture Notes in Computer Science book series (LNTCS,volume 9948)


Performance measures play a pivotal role in the evaluation and selection of machine learning models for a wide range of applications. Using both synthetic and real-world data sets, we investigated the resilience to noise of various ranking measures. Our experiments revealed that the area under the ROC curve (AUC) and a related measure, the truncated average Kolmogorov-Smirnov statistic (taKS), can reliably discriminate between models with truly different performance under various types and levels of noise. With increasing class skew, however, the H-measure and estimators of the area under the precision-recall curve become preferable measures. Because of its simple graphical interpretation and robustness, the lower trapezoid estimator of the area under the precision-recall curve is recommended for highly imbalanced data sets.


  • Ranking
  • Classification
  • Noise
  • Robustness
  • ROC curve
  • AUC
  • H-measure
  • taKS
  • Precision-recall curve

This is a preview of subscription content, access via your institution.

Buying options

USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-46672-9_6
  • Chapter length: 9 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
USD   89.00
Price excludes VAT (USA)
  • ISBN: 978-3-319-46672-9
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   119.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.


  1. Berrar, D.: An empirical evaluation of ranking measures with respect to robustness to noise. J. Artif. Intell. Res. 49, 241–267 (2014)

    MathSciNet  MATH  Google Scholar 

  2. Boyd, K., Eng, K.H., Page, C.D.: Area under the precision-recall curve: point estimates and confidence intervals. In: Blockeel, H., Kersting, K., Nijssen, S., Železný, F. (eds.) ECML PKDD 2013. LNCS (LNAI), vol. 8190, pp. 451–466. Springer, Heidelberg (2013). doi:10.1007/978-3-642-40994-3_29

    CrossRef  Google Scholar 

  3. Davis, J., Goadrich, M.: The relationship between precision-recall and ROC curves. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 233–240. ACM (2006)

    Google Scholar 

  4. Drummond, C.: Machine learning as an experimental science, revisited. In: Proceedings of the 21st National Conference on Artificial Intelligence: Workshop on Evaluation Methods for Machine Learning, pp. 1–5. AAAI Press (2006)

    Google Scholar 

  5. Fawcett, T.: ROC graphs: notes and practical considerations for researchers. Technical Report HPL-2003-4, HP Laboratories, pp. 1–38 (2004)

    Google Scholar 

  6. Ferri, C., Hernández-Orallo, J., Modroiu, R.: An experimental comparison of performance measures for classification. Pattern Recogn. Lett. 30, 27–38 (2009)

    CrossRef  Google Scholar 

  7. Flach, P.: ROC analysis. In: Sammut, C., Webb, G. (eds.) Encyclopedia of Machine Learning, pp. 869–874. Springer, US (2010)

    Google Scholar 

  8. Hand, D.: Measuring classifier performance: a coherent alternative to the area under the ROC curve. Mach. Learn. 77, 103–123 (2009)

    CrossRef  Google Scholar 

  9. Hand, D., Till, R.: A simple generalisation of the area under the ROC curve for multiple class classification problems. Mach. Learn. 45, 171–186 (2001)

    CrossRef  MATH  Google Scholar 

  10. Hernández-Orallo, J., Flach, P., Ferri, C.: A unified view of performance metrics: translating threshold choice into expected classification loss. J. Mach. Learn. Res. 13, 2813–2869 (2012)

    MathSciNet  MATH  Google Scholar 

  11. Lichman, M.: UCI Machine Learning Repository (2013).

  12. Oentaryo, R., Lim, E.P., Finegold, M., Lo, D., Zhu, F., Phua, C., Cheu, E.Y., Yap, G.E., Sim, K., Nguyen, M.N., Perera, K., Neupane, B., Faisal, M., Aung, Z., Woon, W.L., Chen, W., Patel, D., Berrar, D.: Detecting click fraud in online advertising: a data mining approach. J. Mach. Learn. Res. 15(1), 99–140 (2014)

    MathSciNet  Google Scholar 

  13. Parker, C.: On measuring the performance of binary classifiers. Knowl. Inf. Syst. 35, 131–152 (2013)

    CrossRef  Google Scholar 

  14. Prati, R.C., Batista, G., Monard, M.C.: A survey on graphical methods for classification predictive performance evaluation. IEEE Trans. Knowl. Data Eng. 23(11), 1601–1618 (2011)

    CrossRef  Google Scholar 

  15. Provost, F., Fawcett, T.: Robust classification for imprecise environments. Mach. Learn. 42(3), 203–231 (2001)

    CrossRef  MATH  Google Scholar 

  16. R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna (2015).

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Daniel Berrar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Berrar, D. (2016). On the Noise Resilience of Ranking Measures. In: Hirose, A., Ozawa, S., Doya, K., Ikeda, K., Lee, M., Liu, D. (eds) Neural Information Processing. ICONIP 2016. Lecture Notes in Computer Science(), vol 9948. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-46671-2

  • Online ISBN: 978-3-319-46672-9

  • eBook Packages: Computer ScienceComputer Science (R0)