Mathematical Programming

, Volume 173, Issue 1–2, pp 431–464 | Cite as

Conditional gradient type methods for composite nonlinear and stochastic optimization

  • Saeed GhadimiEmail author
Full Length Paper Series A


In this paper, we present a conditional gradient type (CGT) method for solving a class of composite optimization problems where the objective function consists of a (weakly) smooth term and a (strongly) convex regularization term. While including a strongly convex term in the subproblems of the classical conditional gradient method improves its rate of convergence, it does not cost per iteration as much as general proximal type algorithms. More specifically, we present a unified analysis for the CGT method in the sense that it achieves the best known rate of convergence when the weakly smooth term is nonconvex and possesses (nearly) optimal complexity if it turns out to be convex. While implementation of the CGT method requires explicitly estimating problem parameters like the level of smoothness of the first term in the objective function, we also present a few variants of this method which relax such estimation. Unlike general proximal type parameter free methods, these variants of the CGT method do not require any additional effort for computing (sub)gradients of the objective function and/or solving extra subproblems at each iteration. We then generalize these methods under stochastic setting and present a few new complexity results. To the best of our knowledge, this is the first time that such complexity results are presented for solving stochastic weakly smooth nonconvex and (strongly) convex optimization problems.


Iteration complexity Nonconvex optimization Strongly convex optimization Conditional gradient type methods Unified methods Weakly smooth functions 

Mathematics Subject Classification

90C25 90C26 90C15 68Q25 62L20 



The author is very grateful to the associate editor and the anonymous referees for their valuable comments for improving the quality and presentation of the paper.


  1. 1.
    Cartis, C., Gould, N.I.M., Toint, P.L.: On the complexity of steepest descent, newton’s and regularized newton’s methods for nonconvex unconstrained optimization. SIAM J. Optim. 20(6), 2833–2852 (2010)MathSciNetCrossRefzbMATHGoogle Scholar
  2. 2.
    Chapelle, O., Sindhwani, V., Keerthi, S.S.: Optimization techniques for semi-supervised support vector machines. J. Mach. Learn. Res. 9, 203–233 (2008)zbMATHGoogle Scholar
  3. 3.
    Dang, C.D., Lan, G.: Stochastic block mirror descent methods for nonsmooth and stochastic optimization. SIAM J. Optim. 25(2), 856–881 (2015)MathSciNetCrossRefzbMATHGoogle Scholar
  4. 4.
    Devolder, O., Glineur, F., Nesterov, Y.E.: First-order methods with inexact oracle: the strongly convex case. December 2013, CORE Discussion Paper 2013/16Google Scholar
  5. 5.
    Dunn, J.C.: Rates of convergence for conditional gradient algorithms near singular and nonsingular extremals. SIAM J. Control Optim. 17(2), 674–701 (1979)MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Dunn, J.C.: Convergence rates for conditional gradient sequences generated by implicit step length rules. SIAM J. Control Optim. 18(5), 473–487 (1980)MathSciNetCrossRefzbMATHGoogle Scholar
  7. 7.
    Frank, M., Wolfe, P.: An algorithm for quadratic programming. Nav. Res. Logist. Q. 3, 95–110 (1956)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Garber, D., Hazan, E.: A Linearly Convergent Conditional Gradient Algorithm with Applications to Online and Stochastic Optimization. arXiv e-prints (2013)Google Scholar
  9. 9.
    Ghadimi, S., Lan, G., Zhang, H.: Generalized uniformly optimal methods for nonlinear programming, manuscript. Department of Industrial and Systems Engineering, University of Florida, Gainesville, FL, 32611, USA (August 2015)Google Scholar
  10. 10.
    Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for constrained nonconvex stochastic programming. Math. Program. 155, 267–305 (2016)MathSciNetCrossRefzbMATHGoogle Scholar
  11. 11.
    Ghadimi, S., Lan, G.: Accelerated gradient methods for nonconvex nonlinear and stochastic optimization. Math. Program. 156, 59–99 (2016)MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Ghadimi, S., Lan, G.: Stochastic first- and zeroth-order methods for nonconvex stochastic programming. SIAM J. Optim. 23(4), 2341–2368 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    Ghadimi, S., Lan, G.: Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization, II: shrinking procedures and optimal algorithms. SIAM J. Optim. 23, 2061–2089 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    Grandvalet, Y., Bengio, Y.: Semi-supervised learning by entropy minimization. In: Advances in Neural Information Processing Systems (NIPS), p. 17 (2005)Google Scholar
  15. 15.
    Guélat, J., Marcotte, P.: Some comments on wolfe’s ’away step’. Math. Progr. 35(1), 110–119 (1986)MathSciNetCrossRefzbMATHGoogle Scholar
  16. 16.
    Harchaoui, Z., Juditsky, A., Nemirovski, A.S.: Conditional gradient algorithms for machine learning. NIPS OPT Workshop (2012)Google Scholar
  17. 17.
    Ito, M.: New results on subgradient methods for strongly convex optimization problems with a unified analysis. Department of Mathematical and Computing Sciences, Tokyo Institute of Technology, Japan, Tokyo (April 2015)Google Scholar
  18. 18.
    Jaggi, M.: Revisiting frank-wolfe: projection-free sparse convex optimization. In: The 30th International Conference on Machine Learning (2013)Google Scholar
  19. 19.
    Jensen, T., Jørgensen, J.H., Hansen, P., Jensen, S.: Implementation of an optimal first-order method for strongly convex total variation regularization. BIT Numer. Math. 52, 329–356 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
  20. 20.
    Jiang, B., Zhang, S.: Iteration Bounds for Finding the \(\epsilon \)-Stationary Points for Structured Nonconvex Optimization. arXiv e-prints (2014)Google Scholar
  21. 21.
    Kakade, S.M., Shalev-Shwartz, S., Tewari, A.: Regularization techniques for learning with matrices. J. Mach. Learn. Res. 13, 1865–1890 (2012)MathSciNetzbMATHGoogle Scholar
  22. 22.
    Lan, G.: The complexity of large-scale convex programming under a linear optimization oracle. Department of Industrial and Systems Engineering, University of Florida, Gainesville, FL, 32611, USA, (June 2013).
  23. 23.
    Lan, G.: Bundle-level type methods uniformly optimal for smooth and non-smooth convex optimization. Math. Progr. 149(1), 1–45 (2015)CrossRefzbMATHGoogle Scholar
  24. 24.
    Lan, G., Zhou, Y.: Conditional gradient sliding for convex optimization. SIAM J. Optim. 26, 1379–1409 (2016)MathSciNetCrossRefzbMATHGoogle Scholar
  25. 25.
    Luss, R., Teboulle, M.: Conditional gradient algorithms for rank one matrix approximations with a sparsity constraint. SIAM Rev. 55, 65–98 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  26. 26.
    Mason, L., Baxter, J., Bartlett, P., Frean, M.: Boosting algorithms as gradient descent in function space. Proc. NIPS 12, 512–518 (1999)Google Scholar
  27. 27.
    Nemirovski, A.S., Yudin, D.: Problem Complexity and Method Efficiency in Optimization. Wiley, New York (1983)Google Scholar
  28. 28.
    Nemirovskii, A.S., Nesterov, Y.E.: Optimal methods for smooth convex minimization. Zh. Vichisl. Mat. Fiz. 25, 356–369 (1985). (In Russian)MathSciNetGoogle Scholar
  29. 29.
    Nesterov, Y.E.: Complexity bounds for primal-dual methods minimizing the model of objective function. Technical Report, CORE Discussion Papers, Februray (2015)Google Scholar
  30. 30.
    Nesterov, Y.E.: Universal gradient methods for convex optimization problems. Math. Progr. Ser. A (2014).
  31. 31.
    Nesterov, Y.E.: Introductory Lectures on Convex Optimization: A Basic Course. Kluwer, Boston (2004)CrossRefzbMATHGoogle Scholar
  32. 32.
    Nesterov, Y.E.: Gradient methods for minimizing composite objective functions. Math. Progr. Ser. B 140, 125–161 (2013)CrossRefzbMATHGoogle Scholar
  33. 33.
    Pshenichnyi, B.N., Danilin, I.M.: Numerical Methods in Extremal Problems. Mir Publishers, Moscow (1978)Google Scholar
  34. 34.
    Reddi, S.J., Sra, S., Poczos, B., Smola, A.: Stochastic Frank-Wolfe Methods for Nonconvex Optimization. arXiv e-prints (2016)Google Scholar
  35. 35.
    Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B 67, 301–320 (2005)MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature and Mathematical Optimization Society 2018

Authors and Affiliations

  1. 1.Princeton UniversityPrincetonUSA

Personalised recommendations