Skip to main content
Log in

Conditional gradient type methods for composite nonlinear and stochastic optimization

  • Full Length Paper
  • Series A
  • Published:
Mathematical Programming Submit manuscript


In this paper, we present a conditional gradient type (CGT) method for solving a class of composite optimization problems where the objective function consists of a (weakly) smooth term and a (strongly) convex regularization term. While including a strongly convex term in the subproblems of the classical conditional gradient method improves its rate of convergence, it does not cost per iteration as much as general proximal type algorithms. More specifically, we present a unified analysis for the CGT method in the sense that it achieves the best known rate of convergence when the weakly smooth term is nonconvex and possesses (nearly) optimal complexity if it turns out to be convex. While implementation of the CGT method requires explicitly estimating problem parameters like the level of smoothness of the first term in the objective function, we also present a few variants of this method which relax such estimation. Unlike general proximal type parameter free methods, these variants of the CGT method do not require any additional effort for computing (sub)gradients of the objective function and/or solving extra subproblems at each iteration. We then generalize these methods under stochastic setting and present a few new complexity results. To the best of our knowledge, this is the first time that such complexity results are presented for solving stochastic weakly smooth nonconvex and (strongly) convex optimization problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others


  1. Reddi et al. [34] was released several months after releasing the first version of this work.


  1. Cartis, C., Gould, N.I.M., Toint, P.L.: On the complexity of steepest descent, newton’s and regularized newton’s methods for nonconvex unconstrained optimization. SIAM J. Optim. 20(6), 2833–2852 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  2. Chapelle, O., Sindhwani, V., Keerthi, S.S.: Optimization techniques for semi-supervised support vector machines. J. Mach. Learn. Res. 9, 203–233 (2008)

    MATH  Google Scholar 

  3. Dang, C.D., Lan, G.: Stochastic block mirror descent methods for nonsmooth and stochastic optimization. SIAM J. Optim. 25(2), 856–881 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  4. Devolder, O., Glineur, F., Nesterov, Y.E.: First-order methods with inexact oracle: the strongly convex case. December 2013, CORE Discussion Paper 2013/16

  5. Dunn, J.C.: Rates of convergence for conditional gradient algorithms near singular and nonsingular extremals. SIAM J. Control Optim. 17(2), 674–701 (1979)

    Article  MathSciNet  MATH  Google Scholar 

  6. Dunn, J.C.: Convergence rates for conditional gradient sequences generated by implicit step length rules. SIAM J. Control Optim. 18(5), 473–487 (1980)

    Article  MathSciNet  MATH  Google Scholar 

  7. Frank, M., Wolfe, P.: An algorithm for quadratic programming. Nav. Res. Logist. Q. 3, 95–110 (1956)

    Article  MathSciNet  Google Scholar 

  8. Garber, D., Hazan, E.: A Linearly Convergent Conditional Gradient Algorithm with Applications to Online and Stochastic Optimization. arXiv e-prints (2013)

  9. Ghadimi, S., Lan, G., Zhang, H.: Generalized uniformly optimal methods for nonlinear programming, manuscript. Department of Industrial and Systems Engineering, University of Florida, Gainesville, FL, 32611, USA (August 2015)

  10. Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for constrained nonconvex stochastic programming. Math. Program. 155, 267–305 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  11. Ghadimi, S., Lan, G.: Accelerated gradient methods for nonconvex nonlinear and stochastic optimization. Math. Program. 156, 59–99 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  12. Ghadimi, S., Lan, G.: Stochastic first- and zeroth-order methods for nonconvex stochastic programming. SIAM J. Optim. 23(4), 2341–2368 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  13. Ghadimi, S., Lan, G.: Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization, II: shrinking procedures and optimal algorithms. SIAM J. Optim. 23, 2061–2089 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  14. Grandvalet, Y., Bengio, Y.: Semi-supervised learning by entropy minimization. In: Advances in Neural Information Processing Systems (NIPS), p. 17 (2005)

  15. Guélat, J., Marcotte, P.: Some comments on wolfe’s ’away step’. Math. Progr. 35(1), 110–119 (1986)

    Article  MathSciNet  MATH  Google Scholar 

  16. Harchaoui, Z., Juditsky, A., Nemirovski, A.S.: Conditional gradient algorithms for machine learning. NIPS OPT Workshop (2012)

  17. Ito, M.: New results on subgradient methods for strongly convex optimization problems with a unified analysis. Department of Mathematical and Computing Sciences, Tokyo Institute of Technology, Japan, Tokyo (April 2015)

  18. Jaggi, M.: Revisiting frank-wolfe: projection-free sparse convex optimization. In: The 30th International Conference on Machine Learning (2013)

  19. Jensen, T., Jørgensen, J.H., Hansen, P., Jensen, S.: Implementation of an optimal first-order method for strongly convex total variation regularization. BIT Numer. Math. 52, 329–356 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  20. Jiang, B., Zhang, S.: Iteration Bounds for Finding the \(\epsilon \)-Stationary Points for Structured Nonconvex Optimization. arXiv e-prints (2014)

  21. Kakade, S.M., Shalev-Shwartz, S., Tewari, A.: Regularization techniques for learning with matrices. J. Mach. Learn. Res. 13, 1865–1890 (2012)

    MathSciNet  MATH  Google Scholar 

  22. Lan, G.: The complexity of large-scale convex programming under a linear optimization oracle. Department of Industrial and Systems Engineering, University of Florida, Gainesville, FL, 32611, USA, (June 2013).

  23. Lan, G.: Bundle-level type methods uniformly optimal for smooth and non-smooth convex optimization. Math. Progr. 149(1), 1–45 (2015)

    Article  MATH  Google Scholar 

  24. Lan, G., Zhou, Y.: Conditional gradient sliding for convex optimization. SIAM J. Optim. 26, 1379–1409 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  25. Luss, R., Teboulle, M.: Conditional gradient algorithms for rank one matrix approximations with a sparsity constraint. SIAM Rev. 55, 65–98 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  26. Mason, L., Baxter, J., Bartlett, P., Frean, M.: Boosting algorithms as gradient descent in function space. Proc. NIPS 12, 512–518 (1999)

    Google Scholar 

  27. Nemirovski, A.S., Yudin, D.: Problem Complexity and Method Efficiency in Optimization. Wiley, New York (1983)

    Google Scholar 

  28. Nemirovskii, A.S., Nesterov, Y.E.: Optimal methods for smooth convex minimization. Zh. Vichisl. Mat. Fiz. 25, 356–369 (1985). (In Russian)

    MathSciNet  Google Scholar 

  29. Nesterov, Y.E.: Complexity bounds for primal-dual methods minimizing the model of objective function. Technical Report, CORE Discussion Papers, Februray (2015)

  30. Nesterov, Y.E.: Universal gradient methods for convex optimization problems. Math. Progr. Ser. A (2014).

  31. Nesterov, Y.E.: Introductory Lectures on Convex Optimization: A Basic Course. Kluwer, Boston (2004)

    Book  MATH  Google Scholar 

  32. Nesterov, Y.E.: Gradient methods for minimizing composite objective functions. Math. Progr. Ser. B 140, 125–161 (2013)

    Article  MATH  Google Scholar 

  33. Pshenichnyi, B.N., Danilin, I.M.: Numerical Methods in Extremal Problems. Mir Publishers, Moscow (1978)

    Google Scholar 

  34. Reddi, S.J., Sra, S., Poczos, B., Smola, A.: Stochastic Frank-Wolfe Methods for Nonconvex Optimization. arXiv e-prints (2016)

  35. Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B 67, 301–320 (2005)

    Article  MathSciNet  MATH  Google Scholar 

Download references


The author is very grateful to the associate editor and the anonymous referees for their valuable comments for improving the quality and presentation of the paper.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Saeed Ghadimi.

Additional information

This work was done while the author was working at the School of Mathematics of the Institute for Research in Fundamental Sciences (IPM), P.O. Box: 19395-5746, Tehran, Iran, and supported by a grant from IPM.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ghadimi, S. Conditional gradient type methods for composite nonlinear and stochastic optimization. Math. Program. 173, 431–464 (2019).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


Mathematics Subject Classification