Stream-suitable optimization algorithms for some soft-margin support vector machine variants

Abstract

Soft-margin support vector machines (SVMs) are an important class of classification models that are well known to be highly accurate in a variety of settings and over many applications. The training of SVMs usually requires that the data be available all at once, in batch. The Stochastic majorization–minimization (SMM) algorithm framework allows for the training of SVMs on streamed data instead. We utilize the SMM framework to construct algorithms for training hinge loss, squared-hinge loss, and logistic loss SVMs. We prove that our three SMM algorithms are each convergent and demonstrate that the algorithms are comparable to some state-of-the-art SVM-training methods. An application to the famous MNIST data set is used to demonstrate the potential of our algorithms.

This is a preview of subscription content, log in to check access.

Fig. 1

References

  1. Abe, S. (2010). Support Vector Machines for Pattern Classification. London: Springer.

    Google Scholar 

  2. Bohning, D., & Lindsay, B. R. (1988). Monotonicity of quadratic-approximation algorithms. Annals of the Institute of Mathematical Statistics, 40, 641–663.

    MathSciNet  Article  Google Scholar 

  3. Boyd, S., & Vandenberghe, L. (2004). Convex Optimization. Cambridge: Cambridge University Press.

    Google Scholar 

  4. Cappe, O., & Moulines, E. (2009). On-line expectation-maximizatoin algorithm for latent data models. Journal of the Royal Statistical Society B, 71, 593–613.

    Article  Google Scholar 

  5. Chouzenoux, E., Idier, J., & Moussaoui, S. (2011). A majorize-minimize strategy for subspace otpimization applied to image restoration. IEEE Transactions on Image Processing, 20, 1517–1528.

    MathSciNet  Article  Google Scholar 

  6. Chouzenoux, E., Jezierska, A., Pesquet, J.-C., & Talbot, H. (2013). A majorize-minimize subspace approach for \(l_2\)-\(l_0\) image regularization. SIAM Journal of Imaging Science, 6, 563–591.

    Article  Google Scholar 

  7. Chouzenoux, E., & Pesquet, J.-C. (2017). A stochastic majorize-minimize subspace algorithm for online penalized least squares estimation. IEEE Transactions on Signal Processing, 65, 4770–4783.

    MathSciNet  Article  Google Scholar 

  8. Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20, 273–297.

    MATH  Google Scholar 

  9. De Pierro, A. R. (1993). On the relation between the ISRA and the EM algorithm for positron emission tomography. IEEE Transactions on Medical Imaging, 12, 328–333.

    Article  Google Scholar 

  10. Eddelbuettel, D. (2013). Seamless R and C++ Integration with Rcpp. New York: Springer.

    Google Scholar 

  11. Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., & Lin, C.-J. (2008). LIBLINEAR: a library for large linear classification. Journal of Machine Learning Research, 9, 1871–1874.

    MATH  Google Scholar 

  12. Groenen, P. J. F., Nalbantov, G., & Bioch, J. C. (2008). SVM-Maj: a majorization approach to linear support vector machines with different hinge errors. Advances in Data Analysis and Classification, 2, 17–43.

    MathSciNet  Article  Google Scholar 

  13. Helleputte, T. (2017). LiblineaR: Linear Predictive Models Based on the LIBLINEAR C/C++ Library

  14. Hsia, C.-Y., Zhu, Y., & Lin, C.-J. (2017). A study on trust region update rules in Newton methods for large-scale linear classification. Proceedings of Machine Learning Research, 77, 33–48.

    Google Scholar 

  15. Jolliffe, I. T. (2002). Principal Component Analysis. New York: Springer.

    Google Scholar 

  16. Kim, S., Pasupathy, R., & Henderson, S. G. (2015). Handbook of Simulation Optimization, chapter A guide to sample average approximation (pp. 207–243). New York: Springer.

    Google Scholar 

  17. Lange, K. (2016). MM Optimization Algorithms. Philadelphia: SIAM.

    Google Scholar 

  18. LeCun, Y. (1998). The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist/

  19. Lin, C.-J., Weng, R. C., & Keerthi, S. S. (2008). Trust region Newton method for large-scale logistic regression. Journal of Machine Learning Research, 9, 627–650.

    MathSciNet  MATH  Google Scholar 

  20. Mairal, J. (2013). Stochastic majorization-minimization algorithms for large-scale optimization. In Advances in Neural Information Processing Systems (pp. 2283–2291)

  21. Mairal, J. (2015). Incremental majorization-minimization optimization with application to large-scale machine learning. SIAM Journal of Optimization, 25, 829–855.

    MathSciNet  Article  Google Scholar 

  22. McAfee, A., Brynjolfsson, E., & Davenport, T. H. (2012). Big data: the management revolution. Harvard Business Review, 90, 60–68.

    Google Scholar 

  23. Navia-Vasquez, A., Perez-Cruz, F., Artes-Rodriguez, A., & Figueiras-Vidal, A. R. (2001). Weighted least squares training of support vector classifiers leading to compact and adaptive schemes. IEEE Transactions on Neural Networks, 12, 1047–1059.

    Article  Google Scholar 

  24. Nguyen, H. D. & McLachlan, G. J. (2017). Iteratively-reweighted least-squares fitting of support vector machines: a majorization-minimization algorithm approach. In Proceedings of the 2017 Future Technologies Conference (FTC)

  25. Pollard, D. (1991). Asymptotics for least absolute deviation regression estimators. Econometric Theory, 7, 186–199.

    MathSciNet  Article  Google Scholar 

  26. R Core Team (2016). R: a language and environment for statistical computing. R Foundation for Statistical Computing

  27. Razaviyayn, M., Hong, M., & Luo, Z.-Q. (2013). A unified convergence analysis of block successive minimization methods for nonsmooth optimization. SIAM Journal of Optimization, 23, 1126–1153.

    MathSciNet  Article  Google Scholar 

  28. Razaviyayn, M., Sanjabi, M., & Luo, Z.-Q. (2016). A stochastic successive minimization method for nonsmooth nonconvex optimization with applications to transceiver design in wireless communication networks. Mathematical Programming Series B, 157, 515–545.

    MathSciNet  Article  Google Scholar 

  29. Scholkopf, B., & Smola, A. J. (2002). Learning with Kernels. Cambridge: MIT Press.

    Google Scholar 

  30. Shalev-Shwartz, S., Singer, Y., Srebro, N., & Cotter, A. (2011). Pegasos: primal estimated sub-gradient solver for SVM. Mathematical Programming Series B, 127, 3–30.

    MathSciNet  Article  Google Scholar 

  31. Shawe-Taylor, J., & Sun, S. (2011). A review of optimization methodologies in support vector machines. Neurocomputing, 74, 3609–3618.

    Article  Google Scholar 

  32. Steinwart, I., & Christmann, A. (2008). Support Vector Machine. New York: Springer.

    Google Scholar 

  33. Titterington, D. M. (1984). Recursive parameter estimation using incomplete data. Journal of the Royal Statistical Society B, 46, 257–267.

    MathSciNet  MATH  Google Scholar 

  34. Zhang, T. (2004). Solving large scale linear prediction problems using stochastic gradient descent algorithms. In Proceedings of the twenty-first international conference on Machine learning

Download references

Acknowledgements

We thank the Associate Editor and Reviewer of the article for making helpful comments that greatly improved our exposition. HDN was supported by Australian Research Council (ARC) Grant DE170101134. GJM was supported by ARC Grant DP170100907.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Hien D. Nguyen.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Nguyen, H.D., Jones, A.T. & McLachlan, G.J. Stream-suitable optimization algorithms for some soft-margin support vector machine variants. Jpn J Stat Data Sci 1, 81–108 (2018). https://doi.org/10.1007/s42081-018-0001-y

Download citation

Keywords

  • Big data
  • MNIST
  • Stochastic majorization–minimization algorithm
  • Streamed data
  • Support vector machines