Advertisement

Mathematical Programming

, Volume 156, Issue 1–2, pp 59–99 | Cite as

Accelerated gradient methods for nonconvex nonlinear and stochastic programming

  • Saeed Ghadimi
  • Guanghui LanEmail author
Full Length Paper Series A

Abstract

In this paper, we generalize the well-known Nesterov’s accelerated gradient (AG) method, originally designed for convex smooth optimization, to solve nonconvex and possibly stochastic optimization problems. We demonstrate that by properly specifying the stepsize policy, the AG method exhibits the best known rate of convergence for solving general nonconvex smooth optimization problems by using first-order information, similarly to the gradient descent method. We then consider an important class of composite optimization problems and show that the AG method can solve them uniformly, i.e., by using the same aggressive stepsize policy as in the convex case, even if the problem turns out to be nonconvex. We demonstrate that the AG method exhibits an optimal rate of convergence if the composite problem is convex, and improves the best known rate of convergence if the problem is nonconvex. Based on the AG method, we also present new nonconvex stochastic approximation methods and show that they can improve a few existing rates of convergence for nonconvex stochastic optimization. To the best of our knowledge, this is the first time that the convergence of the AG method has been established for solving nonconvex nonlinear programming in the literature.

Keywords

Nonconvex optimization Stochastic programming  Accelerated gradient Complexity 

Mathematics Subject Classification

62L20 90C25 90C15 68Q25 

References

  1. 1.
    Andradóttir, S.: A review of simulation optimization techniques. In: Proceedings of the 1998 Winter Simulation Conference, pp. 151–158 (1998)Google Scholar
  2. 2.
    Asmussen, S., Glynn, P.W.: Stochastic Simulation: Algorithm and Analysis. Springer, New York (2000)Google Scholar
  3. 3.
    Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2, 183–202 (2009)CrossRefMathSciNetzbMATHGoogle Scholar
  4. 4.
    Cartis, C., Gould, N.I.M., Toint, P.L.: On the complexity of steepest descent, Newton’s and regularized Newton’s methods for nonconvex unconstrained optimization. SIAM J. Optim. 20(6), 2833–2852 (2010)CrossRefMathSciNetzbMATHGoogle Scholar
  5. 5.
    Chen, X., Ge, D., Wang, Z., Ye, Y.: Complexity of unconstrained \(l_2-l_p\) minimization. Math. Program. (2012). doi: 10.1007/s10107-012-0613-0
  6. 6.
    Dang, C.D., Lan, G.: Stochastic block mirror descent methods for nonsmooth and stochastic optimization. Manuscript, Department of Industrial and Systems Engineering, University of Florida, Gainesville, FL 32611, USA (2013)Google Scholar
  7. 7.
    Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96, 1348–1360 (2001)CrossRefMathSciNetzbMATHGoogle Scholar
  8. 8.
    Feng, M., Mitchell, J.E., Pang, J.-S., Shen, X., Wächter, A.: Complementarity formulations of \(l_0\)-norm optimization problems. Manuscript (2013)Google Scholar
  9. 9.
    Fu, M.: Optimization for simulation: theory vs. practice. INFORMS J. Comput. 14, 192–215 (2002)CrossRefMathSciNetzbMATHGoogle Scholar
  10. 10.
    Ghadimi, S., Lan, G.: Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization, I: a generic algorithmic framework. SIAM J. Optim. 22, 1469–1492 (2012)CrossRefMathSciNetzbMATHGoogle Scholar
  11. 11.
    Ghadimi, S., Lan, G.: Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization, II: shrinking procedures and optimal algorithms. SIAM J. Optim. 23, 2061–2089 (2013)CrossRefMathSciNetzbMATHGoogle Scholar
  12. 12.
    Ghadimi, S., Lan, G.: Stochastic first- and zeroth-order methods for nonconvex stochastic programming. SIAM J. Optim. 23(4), 2341–2368 (2013)CrossRefMathSciNetzbMATHGoogle Scholar
  13. 13.
    Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for constrained nonconvex stochastic programming. Manuscript, Department of Industrial and Systems Engineering, University of Florida, Gainesville, FL 32611, USA. Mathematical Programming (Under second-round review) (2014)Google Scholar
  14. 14.
    Lemarechal, C., Hiriart-Urruty, J.-B.: Convex Analysis and Minimization Algorithms II. A Series of Comperhensive Studies in Mathematics. Springer Science & Business Media, Heidelberg (1993)Google Scholar
  15. 15.
    Lan, G.: An optimal method for stochastic composite optimization. Math. Program. 133(1), 365–397 (2012)CrossRefMathSciNetzbMATHGoogle Scholar
  16. 16.
    Lan, G.: The complexity of large-scale convex programming under a linear optimization oracle. Manuscript, Department of Industrial and Systems Engineering, University of Florida, Gainesville, FL 32611, USA. http://www.optimization-online.org/ (2013)
  17. 17.
    Lan, G., Monteiro, R.D.C.: Iteration-complexity of first-order penalty methods for convex programming. Math. Program. 138, 115–139 (2013)CrossRefMathSciNetzbMATHGoogle Scholar
  18. 18.
    Lan, G., Monteiro, R.D.C.: Iteration-complexity of first-order augmented lagrangian methods for convex programming. Technical report, Department of Industrial and Systems Engineering, University of Florida, Gainesville, FL 32611, USA. Mathematical Programming (Under second-round review) (2013)Google Scholar
  19. 19.
    Law, A.M.: Simulation Modeling and Analysis. McGraw Hill, New York (2007)Google Scholar
  20. 20.
    Lewis, A.S., Wright, S.J.: A proximal method for composite minimization. Manuscript, Cornell University, Ithaca, NY (2009)Google Scholar
  21. 21.
    Mairal, J., Bach, F., Ponce, J., Sapiro, G.: Online dictionary learning for sparse coding. In: ICML, pp. 689–696 (2009)Google Scholar
  22. 22.
    Monteiro, R.D.C., Svaiter, B.F.: An accelerated hybrid proximal extragradient method for convex optimization and its implications to second-order methods. Manuscript, School of ISyE, Georgia Tech, Atlanta, GA, 30332, USA (2011)Google Scholar
  23. 23.
    Nemirovski, A.S., Juditsky, A., Lan, G., Shapiro, A.: Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19, 1574–1609 (2009)CrossRefMathSciNetzbMATHGoogle Scholar
  24. 24.
    Nemirovski, A.S., Yudin, D.: Problem complexity and method efficiency in optimization. In: Graham, R.L., Lenstra, J.K. (eds.) Wiley-Interscience Series in Discrete Mathematics. Wiley, XV (1983)Google Scholar
  25. 25.
    Nesterov, Y.E.: A method for unconstrained convex minimization problem with the rate of convergence \(O(1/k^2)\). Doklady AN SSSR 269, 543–547 (1983)MathSciNetGoogle Scholar
  26. 26.
    Nesterov, Y.E.: Introductory Lectures on Convex Optimization: A Basic Course. Kluwer Academic Publishers, Massachusetts (2004)CrossRefGoogle Scholar
  27. 27.
    Nesterov, Y.E.: Smooth minimization of nonsmooth functions. Math. Program. 103, 127–152 (2005)CrossRefMathSciNetzbMATHGoogle Scholar
  28. 28.
    Nesterov, Y.E.: How to make gradients small. Optima 88, 10–11 (2012)Google Scholar
  29. 29.
    Nesterov, Y.E.: Gradient methods for minimizing composite objective functions. Math. Program. Ser. B 140, 125–161 (2013)CrossRefMathSciNetzbMATHGoogle Scholar
  30. 30.
    Polyak, B.T., Juditsky, A.B.: Acceleration of stochastic approximation by averaging. SIAM J. Control Optim. 30, 838–855 (1992)CrossRefMathSciNetzbMATHGoogle Scholar
  31. 31.
    Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22, 400–407 (1951)CrossRefMathSciNetzbMATHGoogle Scholar
  32. 32.
    Sartenaer, A., Gratton, S., Toint, P.L.: Recursive trust-region methods for multiscale nonlinear optimization. SIAM J. Optim. 19, 414–444 (2008)CrossRefMathSciNetzbMATHGoogle Scholar
  33. 33.
    Spall, J.C.: Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control. Wiley, Hoboken (2003)CrossRefGoogle Scholar
  34. 34.
    Tseng, P.: On accelerated proximal gradient methods for convex–concave optimization. Manuscript, University of Washington, Seattle (May 2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg and Mathematical Optimization Society 2015

Authors and Affiliations

  1. 1.Department of Industrial and Systems EngineeringUniversity of FloridaGainesvilleUSA

Personalised recommendations