Behavior of accelerated gradient methods near critical points of nonconvex functions
- 459 Downloads
We examine the behavior of accelerated gradient methods in smooth nonconvex unconstrained optimization, focusing in particular on their behavior near strict saddle points. Accelerated methods are iterative methods that typically step along a direction that is a linear combination of the previous step and the gradient of the function evaluated at a point at or near the current iterate. (The previous step encodes gradient information from earlier stages in the iterative process). We show by means of the stable manifold theorem that the heavy-ball method is unlikely to converge to strict saddle points, which are points at which the gradient of the objective is zero but the Hessian has at least one negative eigenvalue. We then examine the behavior of the heavy-ball method and other accelerated gradient methods in the vicinity of a strict saddle point of a nonconvex quadratic function, showing that both methods can diverge from this point more rapidly than the steepest-descent method.
KeywordsAccelerated gradient methods Nonconvex optimization
Mathematics Subject Classification90C26 49M30
We are grateful to Bin Hu for his advice and suggestions on the manuscript. We are also grateful to the referees and editor for helpful suggestions.
- 2.Attouch, H., Goudou, X., Redont, P.: The heavy ball with friction method, I. The continuous dynamical system: global exploration of the local minima of a real-valued function by asymptotic analysis of a dissipative dynamical system. Commun. Contemp. Math. 2(01), 1–34 (2000)MathSciNetCrossRefzbMATHGoogle Scholar
- 6.Du, S.S., Jin, C., Lee, J.D., Jordan, M.I., Singh, A., Poczos, B.: Gradient descent can take exponential time to escape saddle points. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 30, pp. 1067–1077. Curran Associates Inc, Red Hook (2017)Google Scholar
- 8.Jin, C., Netrapalli, P., Jordan, M.I.: Accelerated gradient descent escapes saddle points faster than gradient descent. arXiv preprint arXiv:1711.10456, (2017)
- 9.Lee, J.D., Simchowitz, M., Jordan, M.I., Recht, B.: Gradient descent only converges to minimizers. JMLR Workshop Conf. Proc. 49(1), 1–12 (2016)Google Scholar
- 10.Li, H., Lin, Z.: Accelerated proximal gradient methods for nonconvex programming. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 28, pp. 379–387. Curran Associates Inc, Red Hook (2015)Google Scholar
- 11.Nesterov, Y.: A method for unconstrained convex problem with the rate of convergence \(O(1/k^2)\). Dokl AN SSSR 269, 543–547 (1983)Google Scholar
- 13.Polyak, B.T.: Introduction to Optimization. Optimization Software (1987)Google Scholar
- 14.Recht, B., Wright, S.J.: Nonlinear Optimization for Machine Learning (2017). (Manuscript in preparation)Google Scholar
- 16.Tseng, P.: On accelerated proximal gradient methods for convex-concave optimization. Technical report, Department of Mathematics, University of Washington, (2008)Google Scholar