Skip to main content
Log in

A semismooth Newton method for support vector classification and regression

  • Published:
Computational Optimization and Applications Aims and scope Submit manuscript

Abstract

Support vector machine is an important and fundamental technique in machine learning. In this paper, we apply a semismooth Newton method to solve two typical SVM models: the L2-loss SVC model and the \(\epsilon \)-L2-loss SVR model. The semismooth Newton method is widely used in optimization community. A common belief on the semismooth Newton method is its fast convergence rate as well as high computational complexity. Our contribution in this paper is that by exploring the sparse structure of the models, we significantly reduce the computational complexity, meanwhile keeping the quadratic convergence rate. Extensive numerical experiments demonstrate the outstanding performance of the semismooth Newton method, especially for problems with huge size of sample data (for news20.binary problem with 19,996 features and 1,355,191 samples, it only takes 3 s). In particular, for the \(\epsilon \)-L2-loss SVR model, the semismooth Newton method significantly outperforms the leading solvers including DCD and TRON.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. We realized this work when we drafted our paper.

  2. We use the software LIBNIEAR version 2.11 downloaded from https://www.csie.ntu.edu.tw/ cjlin/liblinear/

References

  1. Al-Mubaid, H., Umair, S.A.: A new text categorization technique using distributional clustering and learning logic. IEEE Trans. Knowl. Data Eng. 18(9), 1156–1165 (2006)

    Article  Google Scholar 

  2. Barzilai, J., Borwein, J.M.: Two-point step size gradient methods. IMA J. Numer. Anal. 8(1), 141–148 (1988)

    Article  MathSciNet  MATH  Google Scholar 

  3. Basak, D., Pal, S., Patranabis, D.C.: Support vector regression. Neural Inf. Process.-Lett. Rev. 11(10), 203–224 (2007)

    Google Scholar 

  4. Boser, B.E., Guyon, I., Vapnik, V.: A training algorithm for optimal margin classifiers. In: Proceeding COLT ’92 Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pp. 144–152. ACM, Pittsburgh (1992)

  5. Bottou, L., Curtis, F.E., Nocedal, J.: Optimization methods for large-scale machine learning. SIAM Rev. 60(2), 223–311 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  6. Chang, K.W., Hsieh, C.J., Lin, C.J.: Coordinate descent method for large-scale L2-loss linear support vector machines. J. Mach. Learn. Res. 9(3), 1369–1398 (2008)

    MathSciNet  MATH  Google Scholar 

  7. Chen, Z., Qi, L.: A semismooth Newton method for tensor eigenvalue complementarity problem. Comput. Optim. Appl. 65(1), 109–126 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  8. Clarke, F.H.: Optimization and Nonsmooth Analysis. Wiley, Hoboken (1983)

    MATH  Google Scholar 

  9. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)

    MATH  Google Scholar 

  10. Cruz, J.Y.B., Ferreira, O.P., Prudente, L.F.: On the global convergence of the inexact semi-smooth Newton method for absolute value equation. Comput. Optim. Appl. 65(1), 1–16 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  11. Facchinei, F., Kanzow, C., Karl, S., et al.: The semismooth Newton method for the solution of quasi-variational inequalities. Comput. Optim. Appl. 62(1), 85–109 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  12. Fan, R.E., Chang, K.W., Hsieh, C.J., et al.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9(9), 1871–1874 (2008)

    MATH  Google Scholar 

  13. Friedman, J., Hastie, T., Tibshirani, R.: The Elements of Statistical Learning. Springer Series in Statistics, New York (2001)

    MATH  Google Scholar 

  14. Gu, W., Chen, W.P., Ko, C.H., et al.: Two smooth support vector machines for \(\epsilon \)-insensitive regression. Comput. Optim. Appl. 70, 1–29 (2018)

    Article  MathSciNet  Google Scholar 

  15. Hestenes, M.R., Stiefel, E.: Methods of conjugate gradients for solving linear systems. J. Res. Nat. Bur. Stand. 49(6), 409–436 (1952)

    Article  MathSciNet  MATH  Google Scholar 

  16. Ho, C.H., Lin, C.J.: Large-scale linear support vector regression. J. Mach. Learn. Res. 13(Nov), 3323–3348 (2012)

    MathSciNet  MATH  Google Scholar 

  17. Hsia, C.Y., Zhu, Y., Lin, C.J.: A study on trust region update rules in Newton methods for large-scale linear classification. In: Workshop and Conference Proceedings, pp. 1–16 (2017)

  18. Hsieh, C.J., Chang, K.W., Lin, C.J., et al.: A dual coordinate descent method for large-scale linear SVM. In: Proceedings of the 25th International Conference on Machine Learning, pp. 408–415. ACM (2008)

  19. Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. Adv. Neural Inf. Process. Syst. 26, 315–323 (2013)

    Google Scholar 

  20. Keerthi, S.S., DeCoste, D.: A modified finite Newton method for fast solution of large scale linear SVMs. J. Mach. Learn. Res. 6(Mar), 341–361 (2005)

    MathSciNet  MATH  Google Scholar 

  21. Labusch, K., Barth, E., Martinetz, E.: Simple method for high-performance digit recognition based on sparse coding. IEEE Trans. Neural Netw. 19(11), 1985–1989 (2008)

    Article  Google Scholar 

  22. Lee, Y.J., Mangasarian, O.L.: SSVM: a smooth support vector machine for classification. Comput. Optim. Appl. 20(1), 5–22 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  23. Li, Q.N., Qi, H.D.: A sequential semismooth Newton method for the nearest low-rank correlation matrix problem. SIAM J. Optim. 21(4), 1641–1666 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  24. Li, X.D., Sun, D.F., Toh, K.C.: A highly efficient semismooth Newton augmented Lagrangian method for solving Lasso problems. SIAM J. Optim. 28(1), 433–458 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  25. Lin, C.J., Weng, R.C., Keerthi, S.S.: Trust region newton methods for large-scale logistic regression. In: Proceedings of the 24th International Conference on Machine Learning, pp. 561–568. ACM (2007)

  26. Luo, Z.Y., Sun, D.F., Toh, K.C., et al.: Solving the OSCAR and SLOPE models using a semismooth Newton-based augmented Lagrangian method (2018). arXiv preprint arXiv:1803.10740

  27. Mangasarian, O.L.: A finite newton method for classification. Optim. Methods Softw. 17(5), 913–929 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  28. Mifflin, R.: Semismooth and semiconvex functions in constrained optimization. SIAM J. Control Optim. 15(6), 959–972 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  29. Qi, H.D.: A semismooth Newton method for the nearest Euclidean distance matrix problem. SIAM J. Matrix Anal. Appl. 34(34), 67–93 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  30. Qi, H.D., Shen, J., Xiu, N.H.: A sequential majorization method for approximating weighted time series of finite rank. Stat. Interface (2017)

  31. Qi, H.D., Sun, D.F.: A quadratically convergent newton method for computing the nearest correlation matrix. SIAM J. Matrix Anal. Appl. 28(2), 360–385 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  32. Qi, L.: C-differentiability, C-differential operators and generalized Newton methods. Applied Mathematics Report AMR96/5, University of New South Wales, Sydney, Australia (1996)

  33. Qi, L., Sun, J.: A nonsmooth version of Newton’s method. Math. Program. 58(1–3), 353–367 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  34. Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22, 400–407 (1951)

    Article  MathSciNet  MATH  Google Scholar 

  35. Rockafellar, R.T., Wets, R.J.B.: Variational analysis. In: Sobolev and BV Spaces, MPS-SIAM Series on Optimization, vol. 30, pp. 324–326 (1998)

  36. Tan, C., Ma, S., Dai, Y.H., et al,: Barzilai-Borwein step size for stochastic gradient descent. In: The 13th Annual Conference on Neural Information Processing Systems (NIPS). Curran Associates Inc., pp. 685–693 (2016)

  37. Vapnik, V.: The nature of statistical learning. Springer, New York (1995)

    Book  MATH  Google Scholar 

  38. Vapnik, V., Golowich, S.E., Smola, A.J.: Support vector method for function approximation, regression estimation and signal processing. Adv. Neural Inf. Process. Syst. 9, 281–287 (1970)

    Google Scholar 

  39. Vapnik, V.N., Kotz, S.: Estimation of Dependences Based on Empirical Data. Springer-Verlag, New York (1982)

    MATH  Google Scholar 

  40. Wu, K., Yap, K.H.: Fuzzy SVM for content-based image retrieval: a pseudo-label support vector machine framework. IEEE Comput. Intell. Mag. 1(2), 10–16 (2006)

    Article  Google Scholar 

  41. Yuan, Y.B., Huang, T.Z.: A polynomial smooth support vector machine for classification. In: International Conference on Advanced Data Mining and Applications. Springer, Berlin, Heidelberg, pp. 157–164 (2005)

  42. Yuan, Y.C., Sun, D.F., Toh, K.C.: An efficient semismooth Newton based algorithm for convex clustering (2018). arXiv preprint arXiv:1802.07091

  43. Zhang, L., Zhou, W.: On the sparseness of 1-norm support vector machines. Neural Netw. 23(3), 373–385 (2010)

    Article  MATH  Google Scholar 

  44. Zhao, X.Y.: A Semismooth Newton-CG Augmented Lagrangian Method for Large Scale Linear and Convex Quadratic SDPS. Ph D (2009)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qingna Li.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This author’s research was supported by the National Science Foundation of China (No.11671036).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yin, J., Li, Q. A semismooth Newton method for support vector classification and regression. Comput Optim Appl 73, 477–508 (2019). https://doi.org/10.1007/s10589-019-00075-z

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10589-019-00075-z

Keywords

Navigation