Advertisement

Non-parametric Online AUC Maximization

  • Balázs SzörényiEmail author
  • Snir Cohen
  • Shie Mannor
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10535)

Abstract

We consider the problems of online and one-pass maximization of the area under the ROC curve (AUC). AUC maximization is hard even in the offline setting and thus solutions often make some compromises. Existing results for the online problem typically optimize for some proxy defined via surrogate losses instead of maximizing the real AUC. This approach is confirmed by results showing that the optimum of these proxies, over the set of all (measurable) functions, maximize the AUC. The problem is that—in order to meet the strong requirements for per round run time complexity—online methods typically work with restricted hypothesis classes and this, as we show, corrupts the above compatibility and causes the methods to converge to suboptimal solutions even in some simple stochastic cases. To remedy this, we propose a different approach and show that it leads to asymptotic optimality. Our theoretical claims and considerations are tested by experiments on real datasets, which provide empirical justification to them.

Notes

Acknowledgements

This research was supported in part by the European Communities Seventh Framework Programme (FP7/2007-2013) under grant agreement 306638 (SUPREL).

Supplementary material

References

  1. 1.
    Agarwal, S.: Surrogate regret bounds for bipartite ranking via strongly proper losses. J. Mach. Learn. Res. 15(1), 1653–1674 (2014)MathSciNetzbMATHGoogle Scholar
  2. 2.
    Agarwal, S., Graepel, T., Herbrich, R., Har-Peled, S., Roth, D.: Generalization bounds for the area under the ROC curve. JMLR 6, 393–425 (2005)MathSciNetzbMATHGoogle Scholar
  3. 3.
    Ailon, N., Mohri, M.: An efficient reduction of ranking to classification. In: COLT 2008, Helsinki, Finland, 9–12 July 2008, pp. 87–98 (2008)Google Scholar
  4. 4.
    Anava, O., Levy, K.: \(k^\ast \)-nearest neighbors: from global to local. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 29, pp. 4916–4924. Curran Associates Inc., Red Hook (2016)Google Scholar
  5. 5.
    Balcan, M.F., Bansal, N., Beygelzimer, A., Coppersmith, D., Langford, J., Sorkin, G.B.: Robust reductions from ranking to classification. Mach. Learn. 72(1), 139–153 (2008)CrossRefzbMATHGoogle Scholar
  6. 6.
    Beygelzimer, A., Kakade, S., Langford, J.: Cover trees for nearest neighbor. In: ICML, pp. 97–104. ACM, New York (2006)Google Scholar
  7. 7.
    Cardot, H., Degras, D.: Online principal component analysis in high dimension: which algorithm to choose? CoRR abs/1511.03688 (2015). http://arxiv.org/abs/1511.03688
  8. 8.
    Chaudhuri, K., Dasgupta, S.: Rates of convergence for nearest neighbor classification. In: NIPS 2014, pp. 3437–3445 (2014)Google Scholar
  9. 9.
    Clémençon, S., Vayatis, N.: Tree-based ranking methods. IEEE Trans. Inf. Theory 55(9), 4316–4336 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Clémençon, S., Lugosi, G., Vayatis, N.: Ranking and empirical minimization of U-statistics. Ann. Stat. 36(2), 844–874 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
  11. 11.
    Cortes, C., Mohri, M.: AUC optimization vs. error rate minimization. In: Thrun, S., Saul, L., Schölkopf, B. (eds.) NIPS, pp. 313–320. MIT Press, Cambridge (2004)Google Scholar
  12. 12.
    Devroye, L., Győrfi, L., Krżyzak, A., Lugosi, G.: On the strong universal consistency of nearest neighbor regression function estimates. Ann. Stat. 22(3), 1371–1385 (1994)MathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    Ding, Y., Zhao, P., Hoi, S.C.H., Ong, Y.: An adaptive gradient method for online AUC maximization. In: AAAI, pp. 2568–2574 (2015)Google Scholar
  14. 14.
    Gao, W., Jin, R., Zhu, S., Zhou, Z.: One-pass AUC optimization. In: ICML 2013, pp. 906–914 (2013)Google Scholar
  15. 15.
    Gao, W., Zhou, Z.: On the consistency of AUC pairwise optimization. In: IJCAI 2015, pp. 939–945 (2015)Google Scholar
  16. 16.
    Hanley, J.A., Mcneil, B.J.: The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143, 29–36 (1982)CrossRefGoogle Scholar
  17. 17.
    Kar, P., Sriperumbudur, B.K., Jain, P., Karnick, H.: On the generalization ability of online learning algorithms for pairwise loss functions. In: 30th ICML 2013, 16–21 June 2013, Atlanta, GA, USA, pp. 441–449 (2013)Google Scholar
  18. 18.
    Kotlowski, W., Dembczynski, K., Hüllermeier, E.: Bipartite ranking through minimization of univariate loss. In: ICML, pp. 1113–1120. Omnipress (2011)Google Scholar
  19. 19.
    Robbiano, S., Clémençon, S.: Minimax learning rates for bipartite ranking and plug-in rules. ICML 2011, pp. 441–448 (2011)Google Scholar
  20. 20.
    Uematsu, K., Lee, Y.: On theoretically optimal ranking functions in bipartite ranking. Technical report 863, Department of Statistics, The Ohio State University, December 2011Google Scholar
  21. 21.
    Wang, Y., Khardon, R., Pechyony, D., Jones, R.: Generalization bounds for online learning algorithms with pairwise loss functions. In: COLT, pp. 13.1-13.22 (2012)Google Scholar
  22. 22.
    Zhao, P., Hoi, S.C.H., Jin, R., Yang, T.: Online AUC maximization. In: ICML, pp. 233–240 (2011)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.TechnionHaifaIsrael
  2. 2.Research Group on AIHungarian Academy of Sciences, University of SzegedSzegedHungary

Personalised recommendations