Advertisement

Stochastic Coordinate Descent Methods for Regularized Smooth and Nonsmooth Losses

  • Qing Tao
  • Kang Kong
  • Dejun Chu
  • Gaowei Wu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7523)

Abstract

Stochastic Coordinate Descent (SCD) methods are among the first optimization schemes suggested for efficiently solving large scale problems. However, until now, there exists a gap between the convergence rate analysis and practical SCD algorithms for general smooth losses and there is no primal SCD algorithm for nonsmooth losses. In this paper, we discuss these issues using the recently developed structural optimization techniques. In particular, we first present a principled and practical SCD algorithm for regularized smooth losses, in which the one-variable subproblem is solved using the proximal gradient method and the adaptive componentwise Lipschitz constant is obtained employing the line search strategy. When the loss is nonsmooth, we present a novel SCD algorithm, in which the one-variable subproblem is solved using the dual averaging method. We show that our algorithms exploit the regularization structure and achieve several optimal convergence rates that are standard in the literature. The experiments demonstrate the expected efficiency of our SCD algorithms in both smooth and nonsmooth cases.

Keywords

Optimization Algorithms Coordinate Descent Algorithms Nonsmooth and smooth Losses Large-Scale Learning 

References

  1. 1.
    Boyd, S., Vandenberghe, L.: Convex optimization. Cambridge University Press (2004)Google Scholar
  2. 2.
    Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences 2(1), 183–202 (2009)MathSciNetzbMATHCrossRefGoogle Scholar
  3. 3.
    Chang, K.W., Hsieh, C.J., Lin, C.J.: Coordinate descent method for large-scale L 2-loss linear support vector machines. Journal of Machine Learning Research 9, 1369–1398 (2008)MathSciNetzbMATHGoogle Scholar
  4. 4.
    Duchi, J., Shalev-Shwartz, S., Singer, Y., Chandra, T.: Efficient projections onto the l 1-ball for learning in high dimensions. In: Proceedings of the 25th International Conference on Machine Learning, pp. 272–279 (2008)Google Scholar
  5. 5.
    Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software 33(1), 1–22 (2010)Google Scholar
  6. 6.
    Hsieh, C.J., Chang, K.W., Lin, C.J., Keerthi, S.S., Sundararajan, S.: A dual coordinate descent method for large-scale linear SVM. In: Proceedings of the 25th International Conference on Machine Learning, pp. 408–415 (2008)Google Scholar
  7. 7.
    Joachims, T.: Training linear SVMs in linear time. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 217–226 (2006)Google Scholar
  8. 8.
    Koh, K., Kim, S.J., Boyd, S.: An interior-point method for large-scale l 1-regularized logistic regression. Journal of Machine Learning Research 8, 1519–1555 (2007)MathSciNetzbMATHGoogle Scholar
  9. 9.
    Lin, C.J., Weng, R.C., Keerthi, S.S.: Trust region newton method for logistic regression. Journal of Machine Learning Research 9, 627–650 (2008)MathSciNetzbMATHGoogle Scholar
  10. 10.
    Liu, J., Ye, J.: Efficient Euclidean projections in linear time. In: Proceedings of the 26th International Conference on Machine Learning, pp. 657–664 (2009)Google Scholar
  11. 11.
    Mangasarian, O.L.: A finite Newton method for classification. Optimization Methods and Software 17(5), 913–929 (2002)MathSciNetzbMATHCrossRefGoogle Scholar
  12. 12.
    Mangasarian, O.L., Musicant, D.R.: Successive overrelaxation for support vector machines. IEEE Trans. Neural Networks 10, 1032–1037 (1999)CrossRefGoogle Scholar
  13. 13.
    Nesterov, Y.: A method of solving a convex programming problem with convergence rate O(1/k 2). Soviet Mathematics Doklady 27, 372–376 (1983)zbMATHGoogle Scholar
  14. 14.
    Nesterov, Y.: Smooth minimization of non-smooth functions. Mathematical Programming 103(1), 127–152 (2005)MathSciNetzbMATHCrossRefGoogle Scholar
  15. 15.
    Nesterov, Y.: Gradient methods for minimizing composite objective function. CORE Discussion Papers 2007076. Center for Operations Research and Econometrics, CORE (2007)Google Scholar
  16. 16.
    Nesterov, Y.: How to advance in structural convex optimization. OPTIMA: Mathematical Programming Society Newsletter 78, 2–5 (2008)Google Scholar
  17. 17.
    Nesterov, Y.: Primal-dual subgradient methods for convex problems. Mathematical Programming 120(1), 221–259 (2009)MathSciNetzbMATHCrossRefGoogle Scholar
  18. 18.
    Nesterov, Y.: Efficiency of coordinate descent methods on huge-scale optimization problems. Technical report, University catholique de Louvain, Center for Operations Research and Econometrics, CORE (2010)Google Scholar
  19. 19.
    Saha, A., Tewari, A.: On the finite time convergence of cyclic coordinate descent methods. Arxiv preprint arXiv:1005.2146 (2010)Google Scholar
  20. 20.
    Shalev-Shwartz, S., Tewari, A.: Stochastic methods for l 1 regularized loss minimization. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 929–936 (2009)Google Scholar
  21. 21.
    Shalev-Shwartz, S., Singer, Y., Srebro, N.P.: Primal estimated sub-gradient solver for SVM. In: Proceedings of the 24th International Conference on Machine Learning, pp. 807–814 (2007)Google Scholar
  22. 22.
    Tao, Q., Sun, Z., Kong, K.: Developing Learning Algorithms via Optimized Discretization of Continuous Dynamical Systems. IEEE Trans. Syst. Man Cybern. B 42(1), 140–149 (2012)CrossRefGoogle Scholar
  23. 23.
    Tseng, P.: On accelerated proximal gradient methods for convex-concave optimization. submitted to SIAM Journal on Optimization (2008)Google Scholar
  24. 24.
    Tseng, P., Yun, S.: A coordinate gradient descent method for nonsmooth separable minimization. Mathematical Programming 117(1), 387–423 (2009)MathSciNetzbMATHCrossRefGoogle Scholar
  25. 25.
    Xiao, L.: Dual averaging methods for regularized stochastic learning and online optimization. Journal of Machine Learning Research 11, 2543–2596 (2010)zbMATHGoogle Scholar
  26. 26.
    Yuan, G.X., Chang, K.W., Hsieh, C.J., Lin, C.: A Comparison of Optimization Methods and Software for Large-scale L 1-regularized Linear Classification. Journal of Machine Learning Research 11, 3183–3234 (2010)MathSciNetzbMATHGoogle Scholar
  27. 27.
    Yun, S., Toh, K.C.: A coordinate gradient descent method for l 1-regularized convex minimization. To appear in Computational Optimizations and Applications (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Qing Tao
    • 1
    • 2
  • Kang Kong
    • 1
  • Dejun Chu
    • 1
  • Gaowei Wu
    • 2
  1. 1.New Star Research Inst. of Applied TechnologyHefeiP.R. China
  2. 2.Inst. of AutomationChinese Academy of SciencesBeijingP.R. China

Personalised recommendations