On Linear Convergence of Non-Euclidean Gradient Methods without Strong Convexity and Lipschitz Gradient Continuity
- 228 Downloads
The gradient method is well known to globally converge linearly when the objective function is strongly convex and admits a Lipschitz continuous gradient. In many applications, both assumptions are often too stringent, precluding the use of gradient methods. In the early 1960s, after the amazing breakthrough of Łojasiewicz on gradient inequalities, it was observed that uniform convexity assumptions could be relaxed and replaced by these inequalities. On the other hand, very recently, it has been shown that the Lipschitz gradient continuity can be lifted and replaced by a class of functions satisfying a non-Euclidean descent property expressed in terms of a Bregman distance. In this note, we combine these two ideas to introduce a class of non-Euclidean gradient-like inequalities, allowing to prove linear convergence of a Bregman gradient method for nonconvex minimization, even when neither strong convexity nor Lipschitz gradient continuity holds.
KeywordsNon-Euclidean gradient methods Nonconvex minimization Bregman distance Lipschitz-like convexity condition Descent lemma without Lipschitz gradient Gradient dominated inequality Łojasiewicz gradient inequality Linear rate of convergence
Mathematics Subject Classification65K05 49M10 90C26 90C30 65K10
Heinz Bauschke was partially supported by the Natural Sciences and Engineering Research Council of Canada. Jérôme Bolte was partially supported by Air Force Office of Scientific Research, Air Force Material Command, USAF, under Grant Number FA9550-18-1-0226. Jiawei Chen was partially supported by the Natural Science Foundation of China (Nos. 11401487, 11771058), the Basic and Advanced Research Project of Chongqing (cstc2016jcyjA0239) and National Scholarship under the China Scholarship Council. Marc Teboulle was partially supported by the Israel Science Foundation under ISF Grant 1844-16. Xianfu Wang was partially supported by the Natural Sciences and Engineering Research Council of Canada.
- 2.Sra, S., Nowozin, S., Wright, S.J. (eds.): Optimization for Machine Learning. MIT Press, Cambridge (2011)Google Scholar
- 6.Łojasiewicz, S.: Une propriété topologique des sous-ensembles analytiques réels. In Les Équations aux Derivées Partielles, Éditions du Centre National de la Recherche Scientifique, Paris, pp. 87–89 (1963)Google Scholar
- 7.Łojasiewicz, S.: Ensembles semi-analytiques, Cours miméographié de la Faculté des Sciences d’Orsay, I.H.E.S., Bures-sur-Yvette (1965). http://perso.univ-rennes1.fr/michel.coste/Lojasiewicz.pdf
- 16.Bartlett, P.L., Hazan, E., Rakhlin, A.: Adaptive online gradient descent. In: Platt, J., Koller, D., Singer, Y., Roweis, S. (eds.) Advances Neural Information Processing Systems, vol. 20, pp. 65–72. MIT Press, Cambridge (2007)Google Scholar
- 19.Rockafellar, R.T., Wets, R.J.-B.: Variational Analysis. Springer, corrected 3rd printing (2009)Google Scholar
- 33.Nemirovsky, A.S., Yudin, D.B. (eds.): Problem Complexity and Method Efficiency in Optimization. Wiley, New York (1983)Google Scholar
- 34.Clarke, F.H.: Optimization and Nonsmooth Analysis. Wiley, New York (1983); Republished as Classics in Applied Mathematics, vol. 5. SIAM, Philadelphia (1990)Google Scholar
- 35.Zhang, H., Dai, Y.-H., Guo, L.: Proximal-like incremental aggregated gradient method with linear convergence under Bregman distance growth conditions. arXiv:1711.01136