Skip to main content

Scalable Nonlinear AUC Maximization Methods

Part of the Lecture Notes in Computer Science book series (LNAI,volume 11052)

Abstract

The area under the ROC curve (AUC) is a widely used measure for evaluating classification performance on heavily imbalanced data. The kernelized AUC maximization machines have established a superior generalization ability compared to linear AUC machines because of their capability in modeling the complex nonlinear structures underlying most real-world data. However, the high training complexity renders the kernelized AUC machines infeasible for large-scale data. In this paper, we present two nonlinear AUC maximization algorithms that optimize linear classifiers over a finite-dimensional feature space constructed via the k-means Nyström approximation. Our first algorithm maximizes the AUC metric by optimizing a pairwise squared hinge loss function using the truncated Newton method. However, the second-order batch AUC maximization method becomes expensive to optimize for extremely massive datasets. This motivates us to develop a first-order stochastic AUC maximization algorithm that incorporates a scheduled regularization update and scheduled averaging to accelerate the convergence of the classifier. Experiments on several benchmark datasets demonstrate that the proposed AUC classifiers are more efficient than kernelized AUC machines while they are able to surpass or at least match the AUC performance of the kernelized AUC machines. We also show experimentally that the proposed stochastic AUC classifier is able to reach the optimal solution, while the other state-of-the-art online and stochastic AUC maximization methods are prone to suboptimal convergence. Code related to this paper is available at: https://sites.google.com/view/majdikhalid/.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/.

  2. 2.

    http://archive.ics.uci.edu/ml/index.php.

References

  1. Agarwal, S., Graepel, T., Herbrich, R., Har-Peled, S., Roth, D.: Generalization bounds for the area under the roc curve. J. Mach. Learn. Res. 6(Apr), 393–425 (2005)

    MathSciNet  MATH  Google Scholar 

  2. Airola, A., Pahikkala, T., Salakoski, T.: Training linear ranking svms in linearithmic time using red-black trees. Pattern Recogn. Lett. 32(9), 1328–1336 (2011)

    CrossRef  Google Scholar 

  3. Bordes, A., Bottou, L., Gallinari, P.: SGD-QN: careful quasi-newton stochastic gradient descent. J. Mach. Learn. Res. 10(Jul), 1737–1754 (2009)

    MathSciNet  MATH  Google Scholar 

  4. Chapelle, O., Keerthi, S.S.: Efficient algorithms for ranking with svms. Inf. Retrieval 13(3), 201–215 (2010)

    CrossRef  Google Scholar 

  5. Chaudhuri, S., Theocharous, G., Ghavamzadeh, M.: Recommending advertisements using ranking functions, uS Patent App. 14/997,987, 18 Jan 2016

    Google Scholar 

  6. Chen, K., Li, R., Dou, Y., Liang, Z., Lv, Q.: Ranking support vector machine with kernel approximation. Comput. Intell. Neurosci. 2017, 4629534 (2017)

    Google Scholar 

  7. Cortes, C., Mohri, M.: AUC optimization vs. error rate minimization. Adv. Neural Inf. Process. Syst. 16(16), 313–320 (2004)

    Google Scholar 

  8. Ding, Y., Liu, C., Zhao, P., Hoi, S.C.: Large scale kernel methods for online AUC maximization. In: 2017 IEEE International Conference on Data Mining (ICDM), pp. 91–100. IEEE (2017)

    Google Scholar 

  9. Ding, Y., Zhao, P., Hoi, S.C., Ong, Y.S.: An adaptive gradient method for online AUC maximization. In: AAAI, pp. 2568–2574 (2015)

    Google Scholar 

  10. Gao, W., Jin, R., Zhu, S., Zhou, Z.H.: One-pass AUC optimization. In: ICML, vol. 3, pp. 906–914 (2013)

    Google Scholar 

  11. Hanley, J.A., McNeil, B.J.: The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143(1), 29–36 (1982)

    CrossRef  Google Scholar 

  12. Hu, J., Yang, H., King, I., Lyu, M.R., So, A.M.C.: Kernelized online imbalanced learning with fixed budgets. In: AAAI, pp. 2666–2672 (2015)

    Google Scholar 

  13. Joachims, T.: A support vector method for multivariate performance measures. In: Proceedings of the 22nd International Conference on Machine Learning, pp. 377–384. ACM (2005)

    Google Scholar 

  14. Joachims, T.: Training linear SVMs in linear time. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 217–226. ACM (2006)

    Google Scholar 

  15. Kakkar, V., Shevade, S., Sundararajan, S., Garg, D.: A sparse nonlinear classifier design using AUC optimization. In: Proceedings of the 2017 SIAM International Conference on Data Mining, pp. 291–299. SIAM (2017)

    CrossRef  Google Scholar 

  16. Keerthi, S.S., Chapelle, O., DeCoste, D.: Building support vector machines with reduced classifier complexity. J. Mach. Learn. Res. 7(Jul), 1493–1515 (2006)

    MathSciNet  MATH  Google Scholar 

  17. Khalid, M., Ray, I., Chitsaz, H.: Confidence-weighted bipartite ranking. In: Li, J., Li, X., Wang, S., Li, J., Sheng, Q.Z. (eds.) ADMA 2016. LNCS (LNAI), vol. 10086, pp. 35–49. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49586-6_3

    CrossRef  Google Scholar 

  18. Kotlowski, W., Dembczynski, K.J., Huellermeier, E.: Bipartite ranking through minimization of univariate loss. In: Proceedings of the 28th International Conference on Machine Learning (ICML-11), pp. 1113–1120 (2011)

    Google Scholar 

  19. Kumar, S., Mohri, M., Talwalkar, A.: Ensemble nystrom method. In: Advances in Neural Information Processing Systems, pp. 1060–1068 (2009)

    Google Scholar 

  20. Kuo, T.M., Lee, C.P., Lin, C.J.: Large-scale kernel RankSVM. In: Proceedings of the 2014 SIAM International Conference on Data Mining, pp. 812–820. SIAM (2014)

    Google Scholar 

  21. Lee, C.P., Lin, C.J.: Large-scale linear rankSVM. Neural Comput. 26(4), 781–817 (2014)

    CrossRef  MathSciNet  Google Scholar 

  22. Liu, T.Y.: Learning to rank for information retrieval. Found. Trends Inf. Retrieval 3(3), 225–331 (2009)

    CrossRef  Google Scholar 

  23. Polyak, B.T., Juditsky, A.B.: Acceleration of stochastic approximation by averaging. SIAM J. Control Optim. 30(4), 838–855 (1992)

    CrossRef  MathSciNet  Google Scholar 

  24. Rahimi, A., Recht, B.: Random features for large-scale kernel machines. In: Advances in Neural Information Processing Systems, pp. 1177–1184 (2008)

    Google Scholar 

  25. Rendle, S., Balby Marinho, L., Nanopoulos, A., Schmidt-Thieme, L.: Learning optimal ranking with tensor factorization for tag recommendation. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 727–736. ACM (2009)

    Google Scholar 

  26. Root, J., Qian, J., Saligrama, V.: Learning efficient anomaly detectors from K-NN graphs. In: Artificial Intelligence and Statistics, pp. 790–799 (2015)

    Google Scholar 

  27. Sculley, D.: Large scale learning to rank. In: NIPS Workshop on Advances in Ranking, pp. 58–63 (2009)

    Google Scholar 

  28. Szörényi, B., Cohen, S., Mannor, S.: Non-parametric Online AUC Maximization. In: Ceci, M., Hollmén, J., Todorovski, L., Vens, C., Džeroski, S. (eds.) ECML PKDD 2017. LNCS (LNAI), vol. 10535, pp. 575–590. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-71246-8_35

    CrossRef  Google Scholar 

  29. Xu, W.: Towards optimal one pass large scale learning with averaged stochastic gradient descent. arXiv preprint arXiv:1107.2490 (2011)

  30. Ying, Y., Wen, L., Lyu, S.: Stochastic online AUC maximization. In: Advances in Neural Information Processing Systems, pp. 451–459 (2016)

    Google Scholar 

  31. Zhang, K., Tsang, I.W., Kwok, J.T.: Improved nyström low-rank approximation and error analysis. In: Proceedings of the 25th International Conference on Machine Learning, pp. 1232–1239. ACM (2008)

    Google Scholar 

  32. Zhao, P., Jin, R., Yang, T., Hoi, S.C.: Online AUC maximization. In: Proceedings of the 28th International Conference on Machine Learning (ICML-11), pp. 233–240 (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Majdi Khalid .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Khalid, M., Ray, I., Chitsaz, H. (2019). Scalable Nonlinear AUC Maximization Methods. In: Berlingerio, M., Bonchi, F., Gärtner, T., Hurley, N., Ifrim, G. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2018. Lecture Notes in Computer Science(), vol 11052. Springer, Cham. https://doi.org/10.1007/978-3-030-10928-8_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-10928-8_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-10927-1

  • Online ISBN: 978-3-030-10928-8

  • eBook Packages: Computer ScienceComputer Science (R0)