Skip to main content

A fast algorithm for training support vector regression via smoothed primal function minimization

Abstract

The support vector regression (SVR) model is usually fitted by solving a quadratic programming problem, which is computationally expensive. To improve the computational efficiency, we propose to directly minimize the objective function in the primal form. However, the loss function used by SVR is not differentiable, which prevents the well-developed gradient based optimization methods from being applicable. As such, we introduce a smooth function to approximate the original loss function in the primal form of SVR, which transforms the original quadratic programming into a convex unconstrained minimization problem. The properties of the proposed smoothed objective function are discussed and we prove that the solution of the smoothly approximated model converges to the original SVR solution. A conjugate gradient algorithm is designed for minimizing the proposed smoothly approximated objective function in a sequential minimization manner. Extensive experiments on real-world datasets show that, compared to the quadratic programming based SVR, the proposed approach can achieve similar prediction accuracy with significantly improved computational efficiency, specifically, it is hundreds of times faster for linear SVR model and multiple times faster for nonlinear SVR model.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2

Notes

  1. The source code is available upon request.

Abbreviations

\(\mathbf{x}\) and y :

The predictor vector and the response variable

\(\mathbf{w}\) and b :

The weight vector and the intercept in linear regression model

\(\epsilon\) and C :

The sensitive parameter and the penalty parameter in SVR model

ξ and ξ * :

The slack variables in SVR model

α and α * :

The Lagrange multipliers in SVR model

\(V_\epsilon(\cdot)\) and \(S_{\epsilon,\tau}(\cdot)\) :

The original and the smoothed version of loss function of SVR

τ :

The smoothing parameter

\({{\varPhi}}(\mathbf{w})\) and \({{\varPhi}}_\tau(\mathbf{w})\) :

The original and the smoothed version of the objective function of SVR model in primal form

\(\hat{\mathbf{w}}\) and \(\hat{\mathbf{w}}_\tau\) :

The minimum point of \({{\varPhi}}(\mathbf{w})\) and \({{\varPhi}}_\tau(\mathbf{w})\), respectively

W :

The objective function of SVR model in dual form

\(\mathbf{I}\) and \(\mathbf{I}^*\) :

The identity matrix and the augmented identity matrix with the first row and first column being 0’s, and the rest is the identity matrix

\(\mathbf{H}\) :

The Hessian matrix of the smoothed objective function \({{\varPhi}}_\tau(\mathbf{w})\)

\(K(\cdot,\cdot)\) :

The kernel function

\(\mathcal{H}\) :

Reproducing kernel Hilbert space

\(\langle \cdot,\cdot \rangle_\mathcal{H}\) :

The inner product of two vectors in reproducing kernel Hilbert space

\(\|f\|_\mathcal{H}\) :

The function norm associated with the reproducing kernel Hilbert space

β :

The coefficients associated with the kernel representation of a function in reproducing kernel Hilbert space

\(\mathbf{K}\) :

The kernel matrix generated from the training set

\(\mathbf{K}^+\) :

An (n + 1) × n matrix with the first row of all 1’s, and the rest is the kernel matrix \(\mathbf{K}\)

\(\mathbf{K}^*\) :

Augmented kernel matrix with the first row and column being 0’s and the rest is the kernel matrix \(\mathbf{K}\)

\(\mathbf{A}_{\cdot i}\) :

The i-th column of matrix \(\mathbf{A}\)

References

  1. Armijo L (1966) Minimization of functions having Lipschitz-continuous first partial derivatives. Pac J Math 16:1-3

    Article  MATH  MathSciNet  Google Scholar 

  2. Bertsekas DP (1999) Nonlinear programming, 2nd edn. Athena Scientific, Belmont

  3. Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, Cambridge

  4. Chang K-W, Hsieh C-J, Lin C-J (2008) Coordinate descent method for large-scale L 2-loss linear support vector machines. J Mach Learn Res 9:1369–1398

    MATH  MathSciNet  Google Scholar 

  5. Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):27:1–27:27

    Article  Google Scholar 

  6. Chapelle O (2007) Training a support vector machine in the primal. Neural Comput 19:1155–1178

    Article  MATH  MathSciNet  Google Scholar 

  7. Chen C, Mangasarian OL (1996) A class of smoothing functions for nonlinear and mixed complementarity problems. Comput Optim Appl 5:97–138

    Article  MATH  MathSciNet  Google Scholar 

  8. Fung G, Mangasarian OL (2004) A feature selection Newton method for support vector machine classification. Comput Optim Appl 28(2):185–202

    Article  MATH  MathSciNet  Google Scholar 

  9. Gunn SR (1997) Support vector machines for classification and regression. Technical Report, Image Speech and Intelligent Systems Research Group, University of Southampton. http://users.ecs.soton.ac.uk/srg/publications/pdf/SVM.pdf

  10. Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: prediction, inference and data mining, 2nd edn. Springer, New York

  11. Ho C-H, Lin C-J (2012) Large-scale linear support vector regression. Technical report of Department of Computer Science and Information Engineering, National Taiwan University. http://www.csie.ntu.edu.tw/cjlin/papers/linear-svr.pdf

  12. Hsieh C-J, Chang K-W, Lin C-J, Keerthi SS, Sundararajan S (2008) A dual coordinate descent method for large-scale linear SVM. In: Proceedings of the 25th international conference on machine learning, pp 408–415

  13. Joachims J (1999) Making large-scale SVM learning practical. In: Schölkopf B, Burges C, Smola A (eds) Advances in kernel methods—support vector learning. MIT-Press, London

  14. Kimeldorf GS, Wahba G (1971) Some results on Tchebycheffian spline functions. J Math Anal Appl 33(1):82–95

    Article  MATH  MathSciNet  Google Scholar 

  15. Lee Y-J, Mangasarian OL (2001) SSVM: a smooth support vector machine for classification. Comput Optim Appl 20(1):5–22

    Article  MATH  MathSciNet  Google Scholar 

  16. Liu H, Palatucci M, Zhang J (2009) Blockwise coordinate descent procedures for the multi-task Lasso, with applications to neural semantic basis discovery. In: Proceedings of the 26th international conference on machine learning, pp 649–656

  17. Mangasarian OL, Musicant DR (2002) Large scale kernel regression via linear programming. Mach Learn 46(1/3):255-269

    Article  MATH  Google Scholar 

  18. Musicant DR, Mangasarian OL (1999) Massive support vector regression. In: Proceedings of NIPS workshop on learning with support vectors: theory and applications

  19. Osuna E, Freund R, Girosi F (1997a) An improved training algorithm for support vector machines. In: Proceedings of IEEE workshop on neural networks for signal processing, pp 276–285

  20. Osuna E, Freund R, Girosi F (1997b). Training support vector machines: an application to face detection. In: Proceedings of IEEE conference on computer vision and pattern recognition

  21. Platt J (1998) Fast training of support vector machines using sequential minimal optimization. In: Schölkopf B, Burges C, Smola A (eds) Advances in kernel methods—support vector learning. MIT-Press, London

  22. Schölkopf B, Smola A (2002) Learning with kernels. MIT Press, Cambridge

  23. Smola AJ, Schölkopf B (2004) A tutorial on support vector regression. Stat Comput 14(3):199–222

    Article  MathSciNet  Google Scholar 

  24. Smola AJ, Schölkopf B, Rätsch G (1999) Linear programs for automatic accuracy control in regression. In: Proceedings of ninth international conference on artificial neural networks, pp 575–580

  25. Tibshirani R (1996) Regression shrinkage and selection via the Lasso. J R Stat Soc Ser B 58(1): 267-288

    MATH  MathSciNet  Google Scholar 

  26. Vapnik V (1998) Statistical learning theory. Wiley, NY

  27. Walsh GR (1975) Methods of optimization. Wiley, NY

  28. Yeh I-C (1998) Modeling of strength of high performance concrete using artificial neural networks. Cement Concrete Res 28(12):1797–1808

    Article  Google Scholar 

  29. Zhang J, Jin R, Yang Y, Hauptmann AG (2003) Modified logistic regression: an approximation to SVM and its applications in large-scale text categorization. In: Proceedings of the 20th international conference on machine learning, pp 888–895

  30. Zheng S (2011) Gradient descent algorithms for quantile regression with smooth approximation. Int J Mach Learn Cybern 2(3):191–207

    Article  Google Scholar 

  31. Zhu J, Rosset S, Hastie T, Tibshirani R (2004) 1-norm support vector machines. In: Proceedings of neural information processing systems

Download references

Acknowledgments

The author would like to extend his sincere gratitude to the associate editor and two anonymous reviewers for their constructive suggestions and comments, which have greatly helped improve the quality of this paper. This work was supported by a Faculty Research Grant from Missouri State University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Songfeng Zheng.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Zheng, S. A fast algorithm for training support vector regression via smoothed primal function minimization. Int. J. Mach. Learn. & Cyber. 6, 155–166 (2015). https://doi.org/10.1007/s13042-013-0200-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-013-0200-6

Keywords

  • Support vector regression
  • Smooth approximation
  • Quadratic programming
  • Conjugate gradient
  • ε-Insensitive loss function