Skip to main content
Log in

Efficient random coordinate descent algorithms for large-scale structured nonconvex optimization

  • Published:
Journal of Global Optimization Aims and scope Submit manuscript

Abstract

In this paper we analyze several new methods for solving nonconvex optimization problems with the objective function consisting of a sum of two terms: one is nonconvex and smooth, and another is convex but simple and its structure is known. Further, we consider both cases: unconstrained and linearly constrained nonconvex problems. For optimization problems of the above structure, we propose random coordinate descent algorithms and analyze their convergence properties. For the general case, when the objective function is nonconvex and composite we prove asymptotic convergence for the sequences generated by our algorithms to stationary points and sublinear rate of convergence in expectation for some optimality measure. Additionally, if the objective function satisfies an error bound condition we derive a local linear rate of convergence for the expected values of the objective function. We also present extensive numerical experiments for evaluating the performance of our algorithms in comparison with state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Auslender, A.: Optimisation Methodes Numeriques. Masson, Paris (1976)

    MATH  Google Scholar 

  2. Beck, A.: The 2-Coordinate Descent Method for Solving Double-Sided Simplex Constrained Minimization Problems. Technical Report (2012)

  3. Bertsekas, D.: Nonlinear Programming. Athena Scientific (1999)

  4. Bonettini, S.: Inexact block coordinate descent methods with application to nonnegative matrix factorization. J. Numer. Anal. 22, 1431–1452 (2011)

    Article  MathSciNet  Google Scholar 

  5. Calamai, P.H., More, J.J.: Projected gradient methods for linearly constrained problems. Math. Program. 39, 93–116 (1987)

    Article  MathSciNet  MATH  Google Scholar 

  6. Chapelle, O., Sindhwani, V., Keerthi, S.: Optimization techniques for semi-supervised support vector machines. J. Mach. Learn. Res. 2, 203–233 (2008)

    Google Scholar 

  7. Fainshil, L., Margaliot, M.: A maximum principle for positive bilinear control systems with applications to positive linear switched systems. SIAM J. Control Optim. 50, 2193–2215 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  8. Judice, J., Raydan, M., Rosa, S.S., Santos, S.A.: On the solution of the symmetric eigenvalue complementarity problem by the spectral projected gradient algorithm. Comput. Optim. Appl. 47, 391–407 (2008)

    MathSciNet  MATH  Google Scholar 

  9. Kocvara, M., Outrata, J.: Effective reformulations of the truss topology design problem. Optim. Eng. (2006)

  10. Lin, C.J., Lucidi, S., Palagi, L., Risi, A., Sciandrone, M.: Decomposition algorithm model for singly linearly-constrained problems subject to lower and upper bounds. J. Optim. Theory Appl. 141(1), 107–126 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  11. Lu, Z., Xiao, L.: Randomized Block Coordinate Non-monotone Gradient Method for a Class of Nonlinear Programming. Technical Report (2013)

  12. Mongeau, M., Torki, M.: Computing eigenelements of real symmetric matrices via optimization. Comput. Optim. Appl. 29, 263–287 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  13. Necoara, I., Nesterov, Y., Glineur, F.: A Random Coordinate Descent Method for Large Optimization Problems with Linear Constraints. Technical Report (2011). http://acse.pub.ro/person/ion-necoara/

  14. Necoara, I., Patrascu, A.: A random coordinate descent algorithm for optimization problems with composite objective function and linear coupled constraints. Comput. Optim. Appl. (2013)

  15. Necoara, I.: Random coordinate descent algorithms for multi-agent convex optimization over networks. IEEE Trans. Autom. Control 58(8), 1–12 (2013)

    Article  MathSciNet  Google Scholar 

  16. Necoara, I., Clipici, D.: Efficient parallel coordinate descent algorithm for convex optimization problems with separable constraints: application to distributed MPC. J. Process Control 23(3), 243–253 (2013)

    Article  Google Scholar 

  17. Nesterov, Y.: Introductory Lectures on Convex Optimization. Kluwer, Dordrecht (2004)

    Book  MATH  Google Scholar 

  18. Nesterov, Y.: Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optim. 22(2), 341–362 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  19. Nesterov, Y.: Gradient methods for minimizing composite objective function. Math. Program. 140(1), 125–161 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  20. Parlett, B.N.: The Symmetric Eigenvalue Problem. SIAM (1997)

  21. Poliak, B.T.: Introduction to Optimization. Optimization Software (1987)

  22. Powell, M.J.D.: On search directions for minimization algorithms. Mathematical Programming (1973)

  23. Richtarik, P., Takac, M.: Efficient Serial and Parallel Coordinate Descent Methods for Huge-Scale Truss Topology Design. Operations Research Proceedings, Springer, pp. 27–32 (2012)

  24. Richtarik, P., Takac, M.: Iteration complexity of randomized block coordinate descent methods for minimizing a composite function. Mathematical Programming (2012)

  25. Richtarik, P., Takac, M.: Parallel Coordinate Descent Methods for Big Data Optimization, Technical Report, (2012). http://www.maths.ed.ac.uk/~richtarik/

  26. Rockafeller, R.T.: The elementary vectors of a subspace in \({\mathbb{R}}^N\). In: Bose, R.C., Downling, T.A. (eds.) Combinatorial Mathematics and its Applications, Proceedings of the Chapel Hill Conference, pp. 104–127 (1969)

  27. Rockafeller, R.T.: Network Flows and Monotropic Optimization. Wiley-Interscience, New York (1984)

    Google Scholar 

  28. Shalev-Shwartz, S., Zhang, T.: Stochastic dual coordinate ascent methods for regularized loss minimization. J. Mach. Learn. Res. 14, 567–599 (2013)

    MathSciNet  MATH  Google Scholar 

  29. Thi, H.A.L., Moeini, M., Dihn, T.P., Judice, J.: A DC programming approach for solving the symmetric eigenvalue complementarity problem. Comput. Optim. Appl. 51, 1097–1117 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  30. Tseng, P., Yun, S.: A coordinate gradient descent for nonsmooth separable minimization. Math. Program. 117, 387–423 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  31. Tseng, P., Yun, S.: A block coordinate gradient descent method for linearly constrained nonsmooth separable optimization. J. Optim. Theory Appl. 140, 513–535 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  32. Tseng, P.: Approximation accuracy, gradient methods and error bound for structured convex optimization. Math. Program. 125(2), 263–295 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  33. Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, Berlin (1995)

    Book  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ion Necoara.

Additional information

The research leading to these results has received funding from: the European Union (FP7/2007–2013) EMBOCON under grant agreement no 248940; CNCS (project TE-231, 19/11.08.2010); ANCS (project PN II, 80EU/2010); POSDRU/89/1.5/S/62557.

Appendix

Appendix

Proof of Lemma 4

We derive our proof based on the following remark (see also [5]), for given \(u, v \in \mathbb {R}^n\) if \(\langle v, u-v \rangle > 0\), then

$$\begin{aligned} \frac{||u||}{||v||} \le \frac{\langle u, u-v \rangle }{\langle v, u-v \rangle }. \end{aligned}$$
(27)

Let \(\alpha > \beta > 0\). Taking \(u = \text {prox}_{\alpha h}(x + \alpha d) -x \) and \(v= \text {prox}_{\beta h}(x+ \beta d) - x\), we show first that inequality \(\langle v, u-v \rangle > 0\) holds. Given a real constant \(c>0\), from the optimality conditions corresponding to proximal operator we have:

$$\begin{aligned} x - \text {prox}_{c h}(x) \in \partial c h(\text {prox}_{c h}(x)). \end{aligned}$$

Therefore, from the convexity of \(h\) we can derive that:

$$\begin{aligned} c h(z) \ge c h(\text {prox}_{c h}(y)) + \langle y - \text {prox}_{ch}(y), z - \text {prox}_{c h}(y) \rangle \quad \; \forall y, z \in \mathbb {R}^n. \end{aligned}$$

Taking \(c=\alpha \), \(z = \text {prox}_{\beta h}(x + \beta d)\) and \(y = x + \alpha d\) we have:

$$\begin{aligned} \langle u, u - v \rangle \le \alpha \left( \langle d, u-v \rangle + h(\text {prox}_{\beta h} (x+\beta d)) - h(\text {prox}_{\alpha h}(x+ \alpha d)) \right) . \end{aligned}$$
(28)

Also, if \(c=\beta \), \(z= \text {prox}_{\alpha h}(x+ \alpha d)\) and \(y = x+ \beta d\), then we have:

$$\begin{aligned} \langle v, u-v \rangle \ge \beta \left( \langle d, u-v \rangle + h(\text {prox}_h(x+\beta d)) - h(\text {prox}_h(x+ \alpha d)) \right) . \end{aligned}$$
(29)

Summing these two inequalities and taking in account that \(\alpha > \beta \) we get:

$$\begin{aligned} \langle d, u -v \rangle + h(\text {prox}_h(x+\beta d)) - h(\text {prox}_h(x+ \alpha d))> 0. \end{aligned}$$

Therefore, replacing this expression into inequality (28) leads to \(\langle v, u-v \rangle > 0\). Finally, from (27),(28) and (29) we get the inequality:

$$\begin{aligned} \frac{||u||}{||v||} \le \frac{\alpha \langle d, u-v \rangle }{\beta \langle d, u-v \rangle }, \end{aligned}$$

and then the statement of Lemma 4 can be easily derived. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Patrascu, A., Necoara, I. Efficient random coordinate descent algorithms for large-scale structured nonconvex optimization. J Glob Optim 61, 19–46 (2015). https://doi.org/10.1007/s10898-014-0151-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10898-014-0151-9

Keywords

Navigation