Advertisement

Feature uncertainty bounds for explicit feature maps and large robust nonlinear SVM classifiers

  • Nicolas CouellanEmail author
  • Sophie Jan
Article
  • 8 Downloads

Abstract

We consider the binary classification problem when data are large and subject to unknown but bounded uncertainties. We address the problem by formulating the nonlinear support vector machine training problem with robust optimization. To do so, we analyze and propose two bounding schemes for uncertainties associated to random approximate features in low dimensional spaces. The proposed bound calculations are based on Random Fourier Features and the Nyström methods. Numerical experiments are conducted to illustrate the benefit of the technique. We also emphasize the decomposable structure of the proposed robust nonlinear formulation that allows the use of efficient stochastic approximation techniques when datasets are large.

Keywords

Robust classification Random Fourier features Nyström method Robust optimization Support vector machines Machine learning 

Mathematics Subject Classification (2010)

62G35 90C25 68T99 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Notes

References

  1. 1.
    Alizadeh, F., Goldfarb, D.: Second-order cone programming. Math. Program. 95(1), 3–51 (2003)MathSciNetCrossRefGoogle Scholar
  2. 2.
    Ben-Tal, A., Bhadra, S., Bhattacharyya, C., Nath, J.S.: Chance constrained uncertain classification via robust optimization. Math. Program. 127(1), 145–173 (2011)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Ben-Tal, A., Bhadra, S., Bhattacharyya, C., Nemirovski, A.: Efficient methods for robust classification under uncertainty in kernel matrices. J. Mach. Learn. Res. 13, 2923–2954 (2012)MathSciNetzbMATHGoogle Scholar
  4. 4.
    Ben-Tal, A., El Ghaoui, L., Nemirovski, A.: Robust Optimization. Princeton Series in Applied Mathematics. Princeton University Press (2009)Google Scholar
  5. 5.
    Bottou, L., Curtis, F., Nocedal, J.: Optimization methods for large-scale machine learning. SIAM Rev. 60(2), 223–311 (2018).  https://doi.org/10.1137/16M1080173 MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Caramanis, C., Mannor, S., Xu, H.: Robust optimization in machine learning. In: Sra, S., Nowozin, S., Wright, S. (eds.) Optimization for Machine Learning, Neural Information Processing Series. The MIT Press, Cambridge (2012)Google Scholar
  7. 7.
    Chang, L.B., Bai, Z., Huang, S.Y., Hwang, C.R.: Asymptotic error bounds for kernel-based Nyström low-rank approximation matrices. J. Multivariate Anal. 120, 102–119 (2013).  https://doi.org/10.1016/j.jmva.2013.05.006 MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    Choromanska, A., Jebara, T., Kim, H., Mohan, M., Monteleoni, C.: Fast spectral clustering via the Nyström method. In: Algorithmic Learning Theory, Lecture Notes in Comput. Sci., vol. 8139, pp 367–381. Springer, Heidelberg (2013),  https://doi.org/10.1007/978-3-642-40935-6_26 Google Scholar
  9. 9.
    Couellan, N., Wang, W.: Uncertainty-safe large scale support vector machines. Comput. Statist. Data Anal. 109, 215–230 (2017)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Couellan, N., Wang, W.: On the convergence of a stochastic approximation method for structured bi-level optimization. Preprint. https://hal.archives-ouvertes.fr/hal-01932372 (2018)
  11. 11.
    Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and other Kernel-Based Learning Methods. Repr. Cambridge University Press (2001)Google Scholar
  12. 12.
    Gittens, A., Mahoney, M.W.: Revisiting the Nyström method for improved large-scale machine learning. J. Mach. Learn. Res. 17, 1–65 (2016)zbMATHGoogle Scholar
  13. 13.
    Homrighausen, D., McDonald, D.J.: On the Nyström and column-sampling methods for the approximate principal components analysis of large datasets. J. Comput. Graph. Statist. 25(2), 344–362 (2016).  https://doi.org/10.1080/10618600.2014.995799 MathSciNetCrossRefGoogle Scholar
  14. 14.
    Li, M., Bi, W., Kwok, J.T., Lu, B.L.: Large-scale Nyström kernel matrix approximation using randomized SVD. IEEE Trans. Neural Netw. Learn. Syst. 26(1), 152–164 (2015).  https://doi.org/10.1109/TNNLS.2014.2359798 MathSciNetCrossRefGoogle Scholar
  15. 15.
    MOSEK-ApS: The MOSEK optimization toolbox for MATLAB manual. Version 7.1 (Revision 28). http://docs.mosek.com/7.1/toolbox/index.html (2015)
  16. 16.
    Rahimi, A., Recht, B.: Random features for large-scale kernel machines. In: Neural Information Processing Systems (2007)Google Scholar
  17. 17.
    Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22(3), 400–407 (1951)MathSciNetCrossRefGoogle Scholar
  18. 18.
    Scholkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge (2001)Google Scholar
  19. 19.
    Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, New York (2004)CrossRefGoogle Scholar
  20. 20.
    Shiwaswamy, P., Bhattacharyya, C., Smola, A.: Second order cone programming approaches for handling missing and uncertain data. J. Mach. Learn. Res. 7, 1283–1314 (2006)MathSciNetzbMATHGoogle Scholar
  21. 21.
    Sturm, J.: Using sedumi 1.02, a matlab toolbox for optimization over symmetric cones. Optim. Methods Softw. 11–12, 625–653 (1999)MathSciNetCrossRefGoogle Scholar
  22. 22.
    Sutherland, D.J., Schneider, J.: On the error of random F,ourier features. arXiv:https://arxiv.org/abs/1506.02785 (2015)
  23. 23.
    Trafalis, T., Gilbert, R.: Robust support vector machines for classification and computational issues. Optim. Methods Softw. 22(1), 187–198 (2007)MathSciNetCrossRefGoogle Scholar
  24. 24.
    Trafalis, T.B., Gilbert, R.C.: Robust classification and regression using support vector machines. Eur. J. Oper. Res. 173(3), 893–909 (2006)MathSciNetCrossRefGoogle Scholar
  25. 25.
    Trokicić, A.: Approximate spectral learning using Nyström method. Facta Univ. Ser. Math. Inform. 31(2), 569–578 (2016)MathSciNetzbMATHGoogle Scholar
  26. 26.
    Vapnik, V.N.: Statistical Learning Theory. Adaptive and Learning Systems for Signal Processing, Communications, and Control. Wiley, New York (1998)Google Scholar
  27. 27.
    Xanthopoulos, P., Pardalos, P., Trafalis, T.: Robust Data Mining. SpringerBriefs in Optimization. Springer. https://books.google.fr/books?id=CqMlwCO5yJcC (2012)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.ENACUniversité de ToulouseToulouseFrance
  2. 2.Institut de Mathématiques de Toulouse, UMR 5219Université de Toulouse, CNRS, UPSToulouse Cedex 9France

Personalised recommendations