Abstract
Subgradient methods converge linearly on a convex function that grows sharply away from its solution set. In this work, we show that the same is true for sharp functions that are only weakly convex, provided that the subgradient methods are initialized within a fixed tube around the solution set. A variety of statistical and signal processing tasks come equipped with good initialization and provably lead to formulations that are both weakly convex and sharp. Therefore, in such settings, subgradient methods can serve as inexpensive local search procedures. We illustrate the proposed techniques on phase retrieval and covariance estimation problems.
Similar content being viewed by others
Notes
To the best of our knowledge, the class of weakly convex functions was introduced in 1972 in a local publication of the Institute of Cybernetics in Kiev and then used in [19]. These are the functions that are lower bounded by a linear function up to a little-o term that is uniform in the base point. The term \(\rho \)-weak convexity, as we use it here, corresponds to the setting when the error term is a quadratic. A number of other names are used in the literature instead of \(\rho \)-weak convexity, such as semi-convex, prox-regular, and lower-\(C^2\).
Note this potential function is different from the direct generalization of the one used in phase retrieval, \(\min _{x}\sum _i \left| \Vert X^Ta_i\Vert ^2-b_i\right| \). The reason for the more exotic formulation is that the differences \(a_{2i}a_{2i}^T-a_{2i-1}a_{2i-1}^T\) in (5) allow the authors of [38] to prove sharpness of the objective function by appealing to an \(\ell _2/\ell _1\) restricted isometry property of the resulting linear map. See [38, Section III.B] for details. It is not clear whether the naive objective function is also sharp with high probability under reasonable statistical assumptions.
References
Jane, P., Netrapalli, P.: Fast exact matrix completion with finite samples. In: Grnwald, P., Hazan, E., Kale, S. (eds.) Proceedings of The 28th Conference on Learning Theory, Proceedings of Machine Learning Research, vol. 40, pp. 1007–1034. PMLR, Paris, France (2015). http://proceedings.mlr.press/v40/Jain15.html
Tu, S., Boczar, R., Simchowitz, M., Soltanolkotabi, M., Recht, B.: Low-rank solutions of linear matrix equations via procrustes flow. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning—Volume 48, ICML’16, pp. 964–973. JMLR.org (2016). http://dl.acm.org/citation.cfm?id=3045390.3045493
Candès, E., Li, X., Soltanolkotabi, M.: Phase retrieval via Wirtinger flow: theory and algorithms. IEEE Trans. Inf. Theory 61(4), 1985–2007 (2015). https://doi.org/10.1109/TIT.2015.2399924
Jain, P., Jin, C., Kakade, S., Netrapalli, P.: Global convergence of non-convex gradient descent for computing matrix squareroot. In: Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS), pp. 479–488 (2017). http://proceedings.mlr.press/v54/jain17a.html
Chen, Y., Wainwright, M.: Fast low-rank estimation by projected gradient descent: general statistical and algorithmic guarantees (2015). arXiv:1509.03025
Meka, R., Jain, P., Dhillon, I.: Guaranteed rank minimization via singular value projection (2009). arXiv:0909.5457
Boumal, N.: Nonconvex phase synchronization. SIAM J. Optim. 26(4), 2355–2377 (2016)
Brutzkus, A., Globerson, A.: Globally optimal gradient descent for a ConvNet with Gaussian inputs (2017). arXiv:1702.07966
Eremin, I.: The relaxation method of solving systems of inequalities with convex functions on the left-hand side. Dokl. Akad. Nauk SSSR 160, 994–996 (1965)
Poljak, B.: Minimization of unsmooth functionals. USSR Comput. Math. Math. Phys. 9, 14–29 (1969)
Shor, N.: The rate of convergence of the method of the generalized gradient descent with expansion of space. Kibernetika (Kiev) 2, 80–85 (1970)
Goffin, J.: On convergence rates of subgradient optimization methods. Math. Program. 13(3), 329–347 (1977). https://doi.org/10.1007/BF01584346
Supittayapornpong, S., Neely, M.: Staggered time average algorithm for stochastic non-smooth optimization with \({{O}}(1/t)\) convergence (2016). arXiv:1607.02842
Yang, T., Lin, Q.: RSG: beating subgradient method without smoothness and strong convexity (2016). arXiv:1512.03107
Johnstone, P., Moulin, P.: Faster subgradient methods for functions with Hölderian growth (2017). arXiv:1704.00196
Polyak, B.: Sharp minima. Institute of Control Sciences Lecture Notes, Moscow, USSR; Presented at the IIASA Workshop on Generalized Lagrangians and Their Applications, IIASA, Laxenburg, Austria, vol. 3, pp. 369–380 (1979)
Burke, J., Ferris, M.: Weak sharp minima in mathematical programming. SIAM J. Control Optim. 31(5), 1340–1359 (1993). https://doi.org/10.1137/0331063
Poljak, B.: Subgradient methods: a survey of Soviet research. In: Nonsmooth Optimization (Proc. IIASA Workshop, Laxenburg, 1977), IIASA Proc. Ser., vol. 3, pp. 5–29. Pergamon, Oxford (1978)
Nurminskii, E.A.: The quasigradient method for the solving of the nonlinear programming problems. Cybernetics 9(1), 145–150 (1973). https://doi.org/10.1007/BF01068677
Nurminskii, E.A.: Minimization of nondifferentiable functions in the presence of noise. Cybernetics 10(4), 619–621 (1974). https://doi.org/10.1007/BF01071541
Davis, D., Drusvyatskiy, D.: Stochastic subgradient method converges at the rate \(O(k^{-1/4})\) on weakly convex functions (2018). arXiv:1802.02988
Drusvyatskiy, D., Lewis, A.: Error bounds, quadratic growth, and linear convergence of proximal methods. To appear in Mathematics of Operations Research (2016). arXiv:1602.06661
Duchi, J., Ruan, F.: Stochastic methods for composite optimization problems (2017). Preprint arXiv:1703.08570
Duchi, J., Ruan, F.: Solving (most) of a set of quadratic equalities: composite optimization for robust phase retrieval (2017). arXiv:1705.02356
Drusvyatskiy, D., Paquette, C.: Efficiency of minimizing compositions of convex functions and smooth maps (2016). Preprint arXiv:1605.00125
Davis, D., Grimmer, B.: Proximally guided stochastic method for nonsmooth, nonconvex problems (2017). Preprint arXiv:1707.03505
Drusvyatskiy, D.: The proximal point method revisited. To appear in SIAG/OPT Views and News (2018). arXiv:1712.06038
Rockafellar, R., Wets, R.B.: Variational Analysis. Grundlehren der mathematischen Wissenschaften, vol. 317. Springer, Berlin (1998)
Mordukhovich, B.: Variational Analysis and Generalized Differentiation I: Basic Theory. Grundlehren der mathematischen Wissenschaften, vol. 330. Springer, Berlin (2006)
Penot, J.P.: Calculus Without Derivatives, Graduate Texts in Mathematics, vol. 266. Springer, New York (2013). https://doi.org/10.1007/978-1-4614-4538-8
Ioffe, A.: Variational Analysis of Regular Mappings. Springer Monographs in Mathematics. Springer, Cham (2017). https://doi-org.offcampus.lib.washington.edu/10.1007/978-3-319-64277-2. Theory and Applications
Poliquin, R., Rockafellar, R.: Prox-regular functions in variational analysis. Trans. Am. Math. Soc. 348, 1805–1838 (1996)
Eldar, Y., Mendelson, S.: Phase retrieval: stability and recovery guarantees. Appl. Comput. Harmon. Anal. 36(3), 473–494 (2014). https://doi.org/10.1016/j.acha.2013.08.003
Davis, D., Drusvyatskiy, D., Paquette, C.: The nonsmooth landscape of phase retrieval (2017). arXiv:1711.03247
Chen, Y., Candès, E.: Solving random quadratic systems of equations is nearly as easy as solving linear systems. Commun. Pure Appl. Math. 70(5), 822–883 (2017). https://doi-org.offcampus.lib.washington.edu/10.1002/cpa.21638
Sun, J., Qu, Q., Wright, J.: A geometric analysis of phase retrieval. To appear in Foundations of Computational Mathematic (2017). arXiv:1602.06664
Tan, Y., Vershynin, R.: Phase retrieval via randomized Kaczmarz: theoretical guarantees (2017). arXiv:1605.08285
Chen, Y., Chi, Y., Goldsmith, A.J.: Exact and stable covariance estimation from quadratic sampling via convex programming. IEEE Trans. Inf. Theory 61(7), 4034–4059 (2015)
Clarke, F.: Optimization and Nonsmooth Analysis. Wiley, New York (1983)
Acknowledgements
Research of Drusvyatskiy and MacPhee was partially supported by the AFOSR YIP Award FA9550-15-1-0237 and by the NSF DMS 1651851 and CCF 1740551 Awards. Research of Paquette was supported by NSF CCF 1740796.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Evgeni A. Nurminski.
Rights and permissions
About this article
Cite this article
Davis, D., Drusvyatskiy, D., MacPhee, K.J. et al. Subgradient Methods for Sharp Weakly Convex Functions. J Optim Theory Appl 179, 962–982 (2018). https://doi.org/10.1007/s10957-018-1372-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10957-018-1372-8