Abstract
We consider the binary classification problem when data are large and subject to unknown but bounded uncertainties. We address the problem by formulating the nonlinear support vector machine training problem with robust optimization. To do so, we analyze and propose two bounding schemes for uncertainties associated to random approximate features in low dimensional spaces. The proposed bound calculations are based on Random Fourier Features and the Nyström methods. Numerical experiments are conducted to illustrate the benefit of the technique. We also emphasize the decomposable structure of the proposed robust nonlinear formulation that allows the use of efficient stochastic approximation techniques when datasets are large.
Similar content being viewed by others
References
Alizadeh, F., Goldfarb, D.: Second-order cone programming. Math. Program. 95(1), 3–51 (2003)
Ben-Tal, A., Bhadra, S., Bhattacharyya, C., Nath, J.S.: Chance constrained uncertain classification via robust optimization. Math. Program. 127(1), 145–173 (2011)
Ben-Tal, A., Bhadra, S., Bhattacharyya, C., Nemirovski, A.: Efficient methods for robust classification under uncertainty in kernel matrices. J. Mach. Learn. Res. 13, 2923–2954 (2012)
Ben-Tal, A., El Ghaoui, L., Nemirovski, A.: Robust Optimization. Princeton Series in Applied Mathematics. Princeton University Press (2009)
Bottou, L., Curtis, F., Nocedal, J.: Optimization methods for large-scale machine learning. SIAM Rev. 60(2), 223–311 (2018). https://doi.org/10.1137/16M1080173
Caramanis, C., Mannor, S., Xu, H.: Robust optimization in machine learning. In: Sra, S., Nowozin, S., Wright, S. (eds.) Optimization for Machine Learning, Neural Information Processing Series. The MIT Press, Cambridge (2012)
Chang, L.B., Bai, Z., Huang, S.Y., Hwang, C.R.: Asymptotic error bounds for kernel-based Nyström low-rank approximation matrices. J. Multivariate Anal. 120, 102–119 (2013). https://doi.org/10.1016/j.jmva.2013.05.006
Choromanska, A., Jebara, T., Kim, H., Mohan, M., Monteleoni, C.: Fast spectral clustering via the Nyström method. In: Algorithmic Learning Theory, Lecture Notes in Comput. Sci., vol. 8139, pp 367–381. Springer, Heidelberg (2013), https://doi.org/10.1007/978-3-642-40935-6_26
Couellan, N., Wang, W.: Uncertainty-safe large scale support vector machines. Comput. Statist. Data Anal. 109, 215–230 (2017)
Couellan, N., Wang, W.: On the convergence of a stochastic approximation method for structured bi-level optimization. Preprint. https://hal.archives-ouvertes.fr/hal-01932372 (2018)
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and other Kernel-Based Learning Methods. Repr. Cambridge University Press (2001)
Gittens, A., Mahoney, M.W.: Revisiting the Nyström method for improved large-scale machine learning. J. Mach. Learn. Res. 17, 1–65 (2016)
Homrighausen, D., McDonald, D.J.: On the Nyström and column-sampling methods for the approximate principal components analysis of large datasets. J. Comput. Graph. Statist. 25(2), 344–362 (2016). https://doi.org/10.1080/10618600.2014.995799
Li, M., Bi, W., Kwok, J.T., Lu, B.L.: Large-scale Nyström kernel matrix approximation using randomized SVD. IEEE Trans. Neural Netw. Learn. Syst. 26(1), 152–164 (2015). https://doi.org/10.1109/TNNLS.2014.2359798
MOSEK-ApS: The MOSEK optimization toolbox for MATLAB manual. Version 7.1 (Revision 28). http://docs.mosek.com/7.1/toolbox/index.html (2015)
Rahimi, A., Recht, B.: Random features for large-scale kernel machines. In: Neural Information Processing Systems (2007)
Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22(3), 400–407 (1951)
Scholkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge (2001)
Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, New York (2004)
Shiwaswamy, P., Bhattacharyya, C., Smola, A.: Second order cone programming approaches for handling missing and uncertain data. J. Mach. Learn. Res. 7, 1283–1314 (2006)
Sturm, J.: Using sedumi 1.02, a matlab toolbox for optimization over symmetric cones. Optim. Methods Softw. 11–12, 625–653 (1999)
Sutherland, D.J., Schneider, J.: On the error of random F,ourier features. arXiv:https://arxiv.org/abs/1506.02785 (2015)
Trafalis, T., Gilbert, R.: Robust support vector machines for classification and computational issues. Optim. Methods Softw. 22(1), 187–198 (2007)
Trafalis, T.B., Gilbert, R.C.: Robust classification and regression using support vector machines. Eur. J. Oper. Res. 173(3), 893–909 (2006)
Trokicić, A.: Approximate spectral learning using Nyström method. Facta Univ. Ser. Math. Inform. 31(2), 569–578 (2016)
Vapnik, V.N.: Statistical Learning Theory. Adaptive and Learning Systems for Signal Processing, Communications, and Control. Wiley, New York (1998)
Xanthopoulos, P., Pardalos, P., Trafalis, T.: Robust Data Mining. SpringerBriefs in Optimization. Springer. https://books.google.fr/books?id=CqMlwCO5yJcC (2012)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Couellan, N., Jan, S. Feature uncertainty bounds for explicit feature maps and large robust nonlinear SVM classifiers. Ann Math Artif Intell 88, 269–289 (2020). https://doi.org/10.1007/s10472-019-09676-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10472-019-09676-0
Keywords
- Robust classification
- Random Fourier features
- Nyström method
- Robust optimization
- Support vector machines
- Machine learning