Skip to main content
Log in

A Two-Step Fixed-Point Proximity Algorithm for a Class of Non-differentiable Optimization Models in Machine Learning

  • Published:
Journal of Scientific Computing Aims and scope Submit manuscript

Abstract

Sparse learning models are popular in many application areas. Objective functions in sparse learning models are usually non-smooth, which makes it difficult to solve them numerically. We develop a fast and convergent two-step iteration scheme for solving a class of non-differentiable optimization models motivated from sparse learning. To overcome the difficulty of the non-differentiability of the models, we first present characterizations of their solutions as fixed-points of mappings involving the proximity operators of the functions appearing in the objective functions. We then introduce a two-step fixed-point algorithm to compute the solutions. We establish convergence results of the proposed two-step iteration scheme and compare it with the alternating direction method of multipliers (ADMM). In particular, we derive specific two-step iteration algorithms for three models in machine learning: \(\ell ^1\)-SVM classification, \(\ell ^1\)-SVM regression, and the SVM classification with the group LASSO regularizer. Numerical experiments with some synthetic datasets and some benchmark datasets show that the proposed algorithm outperforms ADMM and the linear programming method in computational time and memory storage costs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  1. Argyriou, A., Micchelli, C.A., Pontil, M., Shen, L., Xu, Y.: Efficient first order methods for linear composite regularizers. arXiv:1104.1436 (2011)

  2. Bach, F., Jenatton, R., Mairal, J., Obozinski, G., et al.: Optimization with sparsity-inducing penalties. Found. Trends® Mach. Learn. 4, 1–106 (2012)

    MATH  Google Scholar 

  3. Bottou, L., Lin, C.-J.: Support vector machine solvers. Large Scale Kernel Mach. 301–320 (2007)

  4. Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends® Mach. Learn. 3, 1–122 (2011)

    MATH  Google Scholar 

  5. Chang, C., Lin, C.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27:1–27:27 (2011)

    Article  Google Scholar 

  6. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20, 273–297 (1995)

    MATH  Google Scholar 

  7. Fung, G.M., Mangasarian, O.L.: A feature selection Newton method for support vector machine classification. Comput. Optim. Appl. 28, 185–202 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  8. Hastie, T., Tibshirani, R., Wainwright, M.: Statistical Learning with Sparsity: The Lasso and Generalizations. CRC Press, Boca Raton (2015)

    Book  MATH  Google Scholar 

  9. Jacob, L., Obozinski, G., Vert, J.P.: Group lasso with overlap and graph lasso. In: International Conference on Machine Learning, ICML 2009, Montreal, Quebec, Canada, pp. 433–440 (2009)

  10. Koshiba, Y., Abe, S.: Comparison of L1 and L2 support vector machines. In: Proceedings of the International Joint Conference On Neural Networks, 2003, vol. 3. IEEE, pp. 2054–2059 (2003)

  11. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998)

    Article  Google Scholar 

  12. Li, Q., Micchelli, C.A., Shen, L., Xu, Y.: A proximity algorithm accelerated by Gauss–Seidel iterations for L1/TV denoising models. Inverse Probl. 28, 095003 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  13. Li, Q., Shen, L., Xu, Y., Zhang, N.: Multi-step fixed-point proximity algorithms for solving a class of optimization problems arising from image processing. Adv. Comput. Math. 41, 387–422 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  14. Li, Q., Xu, Y., Zhang, N.: Two-step fixed-point proximity algorithms for multi-block separable convex problems. J. Sci. Comput. 70, 1204–1228 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  15. Li, Z., Song, G., Xu, Y.: A fixed-point proximity approach to solving the support vector regression with the group Lasso regularization. Int. J. Numer. Anal. Model. 15, 154–169 (2018)

    MathSciNet  MATH  Google Scholar 

  16. Li, Z., Xu, Y., Ye, Q.: Sparse support vector machines in reproducing kernel Banach spaces. In: Dick, J., Kuo, F., Woźniakowski, H. (eds.) Contemporary Computational Mathematics: A Celebration of the 80th Birthday of Ian Sloan, pp. 869–887. Springer, Cham (2018)

    Chapter  Google Scholar 

  17. Lichman, M.: UCI Machine Learning Repository. University of California, Irvine (2013)

    Google Scholar 

  18. Lin, R., Song, G., Zhang, H.: Multi-task learning in vector-valued reproducing kernel Banach spaces with the \(\ell ^1\) norm. arXiv:1901.01036 [math] (2019)

  19. Mangasarian, O.L.: Exact 1-norm support vector machines via unconstrained convex differentiable minimization. J. Mach. Learn. Res. 7, 1517–1530 (2007)

    MathSciNet  MATH  Google Scholar 

  20. Meier, L., Van De Geer, S., Bühlmann, P.: The group lasso for logistic regression. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 70, 53–71 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  21. Micchelli, C.A., Shen, L., Xu, Y.: Proximity algorithms for image models: denoising. Inverse Prob. 27, 045009 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  22. Micchelli, C.A., Shen, L., Xu, Y., Zeng, X.: Proximity algorithms for the L1/TV image denoising model. Adv. Comput. Math. 38, 401–426 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  23. Micchelli, C.A., Xu, Y., Zhang, H.: Universal kernels. J. Mach. Learn. Res. 7, 2651–2667 (2006)

    MathSciNet  MATH  Google Scholar 

  24. Nesterov, Y.E.: A method for solving the convex programming problem with convergence rate O (\(1/\)k\(\hat{2}\)), in Dokl. Akad. Nauk Sssr 269, 543–547 (1983)

    MathSciNet  Google Scholar 

  25. Parikh, N., Boyd, S.: Proximal algorithms. Found. Trends Optim. 1, 127–239 (2014)

    Article  Google Scholar 

  26. Schölkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge (2002)

    Google Scholar 

  27. Song, G., Zhang, H.: Reproducing kernel Banach spaces with the l1 norm II: error analysis for regularized least square regression. Neural Comput. 23, 2713–2729 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  28. Song, G., Zhang, H., Hickernell, F.J.: Reproducing kernel Banach spaces with the l1 norm. Appl. Comput. Harmon. Anal. 34, 96–116 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  29. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Methodol. 58, 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  30. Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)

    MATH  Google Scholar 

  31. Wang, R., Xu, Y.: Functional reproducing kernel Hilbert spaces for non-point-evaluation functional data. Appl. Comput. Harmon. Anal. 46, 569–623 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  32. Xu, Y., Ye, Q.: Generalized Mercer kernels and reproducing kernel Banach spaces. Mem. Am. Math. Soc. 258, 1–122 (2019)

    MathSciNet  MATH  Google Scholar 

  33. Yuan, G., Chang, K., Hsieh, C., Lin, C.: A comparison of optimization methods and software for large-scale L1-regularized linear classification. J. Mach. Learn. Res. 11, 3183–3234 (2010)

    MathSciNet  MATH  Google Scholar 

  34. Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. 68, 49–67 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  35. Zhang, H., Xu, Y., Zhang, J.: Reproducing kernel Banach spaces for machine learning. J. Mach. Learn. Res. 10, 3520–3527 (2009)

    MathSciNet  MATH  Google Scholar 

  36. Zhu, J., Rosset, S., Hastie, T., Tibshirani, R.: 1-norm support vector machines. Adv. Neural Inf. Process. Syst. 16, 49–56 (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuesheng Xu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This research was supported in part by the Ministry of Science and Technology of China under Grant 2016YFB0200602, by the Natural Science Foundation of China under Grants 11471013, 11771464, and by the US National Science Foundation under Grants DMS-1521661, DMS-1522332, DMS-1912958 and DMS-1939203.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, Z., Song, G. & Xu, Y. A Two-Step Fixed-Point Proximity Algorithm for a Class of Non-differentiable Optimization Models in Machine Learning. J Sci Comput 81, 923–940 (2019). https://doi.org/10.1007/s10915-019-01045-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10915-019-01045-7

Keywords

Navigation