Abstract
The area under the ROC curve (AUC) is a widely used measure for evaluating classification performance on heavily imbalanced data. The kernelized AUC maximization machines have established a superior generalization ability compared to linear AUC machines because of their capability in modeling the complex nonlinear structures underlying most real-world data. However, the high training complexity renders the kernelized AUC machines infeasible for large-scale data. In this paper, we present two nonlinear AUC maximization algorithms that optimize linear classifiers over a finite-dimensional feature space constructed via the k-means Nyström approximation. Our first algorithm maximizes the AUC metric by optimizing a pairwise squared hinge loss function using the truncated Newton method. However, the second-order batch AUC maximization method becomes expensive to optimize for extremely massive datasets. This motivates us to develop a first-order stochastic AUC maximization algorithm that incorporates a scheduled regularization update and scheduled averaging to accelerate the convergence of the classifier. Experiments on several benchmark datasets demonstrate that the proposed AUC classifiers are more efficient than kernelized AUC machines while they are able to surpass or at least match the AUC performance of the kernelized AUC machines. We also show experimentally that the proposed stochastic AUC classifier is able to reach the optimal solution, while the other state-of-the-art online and stochastic AUC maximization methods are prone to suboptimal convergence. Code related to this paper is available at: https://sites.google.com/view/majdikhalid/.
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Agarwal, S., Graepel, T., Herbrich, R., Har-Peled, S., Roth, D.: Generalization bounds for the area under the roc curve. J. Mach. Learn. Res. 6(Apr), 393–425 (2005)
Airola, A., Pahikkala, T., Salakoski, T.: Training linear ranking svms in linearithmic time using red-black trees. Pattern Recogn. Lett. 32(9), 1328–1336 (2011)
Bordes, A., Bottou, L., Gallinari, P.: SGD-QN: careful quasi-newton stochastic gradient descent. J. Mach. Learn. Res. 10(Jul), 1737–1754 (2009)
Chapelle, O., Keerthi, S.S.: Efficient algorithms for ranking with svms. Inf. Retrieval 13(3), 201–215 (2010)
Chaudhuri, S., Theocharous, G., Ghavamzadeh, M.: Recommending advertisements using ranking functions, uS Patent App. 14/997,987, 18 Jan 2016
Chen, K., Li, R., Dou, Y., Liang, Z., Lv, Q.: Ranking support vector machine with kernel approximation. Comput. Intell. Neurosci. 2017, 4629534 (2017)
Cortes, C., Mohri, M.: AUC optimization vs. error rate minimization. Adv. Neural Inf. Process. Syst. 16(16), 313–320 (2004)
Ding, Y., Liu, C., Zhao, P., Hoi, S.C.: Large scale kernel methods for online AUC maximization. In: 2017 IEEE International Conference on Data Mining (ICDM), pp. 91–100. IEEE (2017)
Ding, Y., Zhao, P., Hoi, S.C., Ong, Y.S.: An adaptive gradient method for online AUC maximization. In: AAAI, pp. 2568–2574 (2015)
Gao, W., Jin, R., Zhu, S., Zhou, Z.H.: One-pass AUC optimization. In: ICML, vol. 3, pp. 906–914 (2013)
Hanley, J.A., McNeil, B.J.: The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143(1), 29–36 (1982)
Hu, J., Yang, H., King, I., Lyu, M.R., So, A.M.C.: Kernelized online imbalanced learning with fixed budgets. In: AAAI, pp. 2666–2672 (2015)
Joachims, T.: A support vector method for multivariate performance measures. In: Proceedings of the 22nd International Conference on Machine Learning, pp. 377–384. ACM (2005)
Joachims, T.: Training linear SVMs in linear time. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 217–226. ACM (2006)
Kakkar, V., Shevade, S., Sundararajan, S., Garg, D.: A sparse nonlinear classifier design using AUC optimization. In: Proceedings of the 2017 SIAM International Conference on Data Mining, pp. 291–299. SIAM (2017)
Keerthi, S.S., Chapelle, O., DeCoste, D.: Building support vector machines with reduced classifier complexity. J. Mach. Learn. Res. 7(Jul), 1493–1515 (2006)
Khalid, M., Ray, I., Chitsaz, H.: Confidence-weighted bipartite ranking. In: Li, J., Li, X., Wang, S., Li, J., Sheng, Q.Z. (eds.) ADMA 2016. LNCS (LNAI), vol. 10086, pp. 35–49. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49586-6_3
Kotlowski, W., Dembczynski, K.J., Huellermeier, E.: Bipartite ranking through minimization of univariate loss. In: Proceedings of the 28th International Conference on Machine Learning (ICML-11), pp. 1113–1120 (2011)
Kumar, S., Mohri, M., Talwalkar, A.: Ensemble nystrom method. In: Advances in Neural Information Processing Systems, pp. 1060–1068 (2009)
Kuo, T.M., Lee, C.P., Lin, C.J.: Large-scale kernel RankSVM. In: Proceedings of the 2014 SIAM International Conference on Data Mining, pp. 812–820. SIAM (2014)
Lee, C.P., Lin, C.J.: Large-scale linear rankSVM. Neural Comput. 26(4), 781–817 (2014)
Liu, T.Y.: Learning to rank for information retrieval. Found. Trends Inf. Retrieval 3(3), 225–331 (2009)
Polyak, B.T., Juditsky, A.B.: Acceleration of stochastic approximation by averaging. SIAM J. Control Optim. 30(4), 838–855 (1992)
Rahimi, A., Recht, B.: Random features for large-scale kernel machines. In: Advances in Neural Information Processing Systems, pp. 1177–1184 (2008)
Rendle, S., Balby Marinho, L., Nanopoulos, A., Schmidt-Thieme, L.: Learning optimal ranking with tensor factorization for tag recommendation. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 727–736. ACM (2009)
Root, J., Qian, J., Saligrama, V.: Learning efficient anomaly detectors from K-NN graphs. In: Artificial Intelligence and Statistics, pp. 790–799 (2015)
Sculley, D.: Large scale learning to rank. In: NIPS Workshop on Advances in Ranking, pp. 58–63 (2009)
Szörényi, B., Cohen, S., Mannor, S.: Non-parametric Online AUC Maximization. In: Ceci, M., Hollmén, J., Todorovski, L., Vens, C., Džeroski, S. (eds.) ECML PKDD 2017. LNCS (LNAI), vol. 10535, pp. 575–590. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-71246-8_35
Xu, W.: Towards optimal one pass large scale learning with averaged stochastic gradient descent. arXiv preprint arXiv:1107.2490 (2011)
Ying, Y., Wen, L., Lyu, S.: Stochastic online AUC maximization. In: Advances in Neural Information Processing Systems, pp. 451–459 (2016)
Zhang, K., Tsang, I.W., Kwok, J.T.: Improved nyström low-rank approximation and error analysis. In: Proceedings of the 25th International Conference on Machine Learning, pp. 1232–1239. ACM (2008)
Zhao, P., Jin, R., Yang, T., Hoi, S.C.: Online AUC maximization. In: Proceedings of the 28th International Conference on Machine Learning (ICML-11), pp. 233–240 (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Khalid, M., Ray, I., Chitsaz, H. (2019). Scalable Nonlinear AUC Maximization Methods. In: Berlingerio, M., Bonchi, F., Gärtner, T., Hurley, N., Ifrim, G. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2018. Lecture Notes in Computer Science(), vol 11052. Springer, Cham. https://doi.org/10.1007/978-3-030-10928-8_18
Download citation
DOI: https://doi.org/10.1007/978-3-030-10928-8_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-10927-1
Online ISBN: 978-3-030-10928-8
eBook Packages: Computer ScienceComputer Science (R0)
-
Published in cooperation with
http://www.ecmlpkdd.org/