Gradient descent with random initialization: fast global convergence for nonconvex phase retrieval

Chen, Yuxin; Chi, Yuejie; Fan, Jianqing; Ma, Cong

doi:10.1007/s10107-019-01363-6

Gradient descent with random initialization: fast global convergence for nonconvex phase retrieval

Full Length Paper
Series B
Published: 04 February 2019

Volume 176, pages 5–37, (2019)
Cite this article

Mathematical Programming Submit manuscript

Yuxin Chen¹,
Yuejie Chi²,
Jianqing Fan³ &
…
Cong Ma ORCID: orcid.org/0000-0003-2532-0038³

5338 Accesses
79 Citations
7 Altmetric
Explore all metrics

Abstract

This paper considers the problem of solving systems of quadratic equations, namely, recovering an object of interest \(\varvec{x}^{\natural }\in {\mathbb {R}}^{n}\) from m quadratic equations/samples \(y_{i}=(\varvec{a}_{i}^{\top }\varvec{x}^{\natural })^{2}, 1\le i\le m\). This problem, also dubbed as phase retrieval, spans multiple domains including physical sciences and machine learning. We investigate the efficacy of gradient descent (or Wirtinger flow) designed for the nonconvex least squares problem. We prove that under Gaussian designs, gradient descent—when randomly initialized—yields an \(\epsilon \)-accurate solution in \(O\big (\log n+\log (1/\epsilon )\big )\) iterations given nearly minimal samples, thus achieving near-optimal computational and sample complexities at once. This provides the first global convergence guarantee concerning vanilla gradient descent for phase retrieval, without the need of (i) carefully-designed initialization, (ii) sample splitting, or (iii) sophisticated saddle-point escaping schemes. All of these are achieved by exploiting the statistical models in analyzing optimization algorithms, via a leave-one-out approach that enables the decoupling of certain statistical dependency between the gradient descent iterates and the data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval, Matrix Completion, and Blind Deconvolution

Article Open access 05 August 2019

A conjugate gradient-based algorithm for large-scale quadratic programming problem with one quadratic constraint

Article 09 May 2019

Stochastic algorithms with geometric step decay converge linearly on sharp functions

Article 05 September 2023

Notes

An iterative algorithm is said to enjoy linear convergence if the iterates \(\{\varvec{x}^{t}\}\) converge geometrically fast to the minimizer \(\varvec{x}^{\natural }\).
Here, we do not take the absolute value of \(x_{\parallel }^{t}\). As we shall see later, the \(x_{\parallel }^{t}\)’s are of the same sign throughout the execution of the algorithm.
More specifically, the GD update \(\varvec{x}^{t+1}=\varvec{x}^{t}-m^{-1}{\eta _{t}}\sum _{i=1}^{m}\big [(\varvec{a}_{i}^{\top }\varvec{x}^{t})^{2}-y_{i}\big ]\varvec{a}_{i}\varvec{a}_{i}^{\top }\varvec{x}_{t}\approx (\varvec{I}+m^{-1}{\eta _{t}}\sum _{i=1}^{m}y_{i}\varvec{a}_{i}\varvec{a}_{i}^{\top })\varvec{x}_{t}\) when \(\varvec{x}_t\approx \varvec{0}\), which is equivalent to a power iteration (without normalization) w.r.t. the data matrix \(\varvec{I}+m^{-1}{\eta _{t}}\sum _{i=1}^{m}y_{i}\varvec{a}_{i}\varvec{a}_{i}^{\top }\).
When applied to phase retrieval with \(m\asymp n\,\mathrm {poly}\log n\), one has \(L\asymp n\), \(\rho \asymp n\), \(\theta \asymp \gamma \asymp 1\) (see [59, Theorem 2.2]), \(\alpha \asymp 1\), and \(\beta \gtrsim n\) (ignoring logarithmic factors).
This is because of the rotational invariance of Gaussian distributions.

References

Agarwal, N., Allen-Zhu, Z., Bullins, B., Hazan, E., Ma, T.: Finding approximate local minima for nonconvex optimization in linear time (2016). arXiv preprint arXiv:1611.01146
Abbe, E., Fan, J., Wang, K., Zhong, Y.: Entrywise eigenvector analysis of random matrices with low expected rank (2017). arXiv preprint arXiv:1709.09565
Allen-Zhu, Z.: Natasha 2: faster non-convex optimization than SGD (2017). arXiv preprint arXiv:1708.08694
Bandeira, A.S., Cahill, J., Mixon, D.G., Nelson, A.A.: Saving phase: injectivity and stability for phase retrieval. Appl. Comput. Harmonic Anal. 37(1), 106–125 (2014)
Article MathSciNet MATH Google Scholar
Bendory, T., Eldar, Y.C., Boumal, N.: Non-convex phase retrieval from STFT measurements. IEEE Trans. Inf. Theory 64(1), 467–484 (2018)
Article MathSciNet MATH Google Scholar
Chen, Y., Candès, E.J.: Solving random quadratic systems of equations is nearly as easy as solving linear systems. Commun. Pure Appl. Math. 70(5), 822–883 (2017)
Article MathSciNet MATH Google Scholar
Chen, Y., Candès, E.: The projected power method: an efficient algorithm for joint alignment from pairwise differences. Commun. Pure Appl. Math. 71(8), 1648–1714 (2018)
Article MathSciNet MATH Google Scholar
Chen, Y., Cheng, C., Fan, J.: Asymmetry helps: eigenvalue and eigenvector analyses of asymmetrically perturbed low-rank matrices (2018). arXiv preprint arXiv:1811.12804
Chen, Y., Chi, Y., Goldsmith, A.J.: Exact and stable covariance estimation from quadratic sampling via convex programming. IEEE Trans. Inf. Theory 61(7), 4034–4059 (2015)
Article MathSciNet MATH Google Scholar
Candès, E.J., Eldar, Y.C., Strohmer, T., Voroninski, V.: Phase retrieval via matrix completion. SIAM J. Imaging Sci. 6(1), 199–225 (2013)
Article MathSciNet MATH Google Scholar
Chen, P., Fannjiang, A., Liu, G.-R.: Phase retrieval with one or two diffraction patterns by alternating projections with the null initialization. J. Fourier Anal. Appl. 24(3), 719–758 (2018)
Article MathSciNet MATH Google Scholar
Chen, Y., Fan, J., Ma, C., Wang, K.: Spectral method and regularized MLE are both optimal for top-\(K\) ranking (2017). arXiv preprint arXiv:1707.09971
Candès, E.J., Li, X.: Solving quadratic equations via PhaseLift when there are about as many equations as unknowns. Found. Comput. Math. 14(5), 1017–1026 (2014)
Article MathSciNet MATH Google Scholar
Chi, Y., Lu, Y.M.: Kaczmarz method for solving quadratic equations. IEEE Signal Process. Lett. 23(9), 1183–1187 (2016)
Article Google Scholar
Chen, J., Li, X.: Memory-efficient kernel PCA via partial matrix sampling and nonconvex optimization: a model-free analysis of local minima (2017). arXiv preprint arXiv:1711.01742
Chi, Y., Lu, Y.M., Chen, Y.: Nonconvex optimization meets low-rank matrix factorization: an overview (2018). arXiv preprint arXiv:1809.09573
Cai, T.T., Li, X., Ma, Z.: Optimal rates of convergence for noisy sparse phase retrieval via thresholded Wirtinger flow. Ann. Stat. 44(5), 2221–2251 (2016)
Article MathSciNet MATH Google Scholar
Candès, E.J., Li, X., Soltanolkotabi, M.: Phase retrieval via Wirtinger flow: theory and algorithms. IEEE Trans. Inf. Theory 61(4), 1985–2007 (2015)
Article MathSciNet MATH Google Scholar
Cai, J.-F., Liu, H., Wang, Y.: Fast rank one alternating minimization algorithm for phase retrieval (2017). arXiv preprint arXiv:1708.08751
Candès, E.J., Strohmer, T., Voroninski, V.: Phaselift: exact and stable signal recovery from magnitude measurements via convex programming. Commun. Pure Appl. Math. 66(8), 1017–1026 (2013)
Article MathSciNet MATH Google Scholar
Chen, Y., Wainwright, M.J.: Fast low-rank estimation by projected gradient descent: general statistical and algorithmic guarantees (2015). arXiv preprint arXiv:1509.03025
Chen, J., Wang, L., Zhang, X., Gu, Q.: Robust Wirtinger flow for phase retrieval with arbitrary corruption (2017). arXiv preprint arXiv:1704.06256
Chen, Y., Yi, X., Caramanis, C.: A convex formulation for mixed regression with two components: minimax optimal rates. In: Conference on Learning Theory, pp. 560–604 (2014)
Cai, T., Zhang, A.: ROP: matrix recovery via rank-one projections. Ann. Stat. 43(1), 102–138 (2015)
Article MathSciNet MATH Google Scholar
Demanet, L., Hand, P.: Stable optimizationless recovery from phaseless linear measurements. J. Fourier Anal. Appl. 20(1), 199–221 (2014)
Article MathSciNet MATH Google Scholar
Du, S.S., Jin, C., Lee, J.D., Jordan, M.I., Singh, A., Poczos, B.: Gradient descent can take exponential time to escape saddle points. In: Advances in Neural Information Processing Systems, pp. 1067–1077 (2017)
Duchi, J.C., Ruan, F.: Solving (most) of a set of quadratic equalities: composite optimization for robust phase retrieval (2017). arXiv preprint arXiv:1705.02356
El Karoui, N.: On the impact of predictor geometry on the performance on high-dimensional ridge-regularized generalized robust regression estimators. Probab. Theory Rel. Fields 170(1–2), 95–175 (2018)
Article MathSciNet MATH Google Scholar
El Karoui, N., Bean, D., Bickel, P.J., Lim, C., Yu, B.: On robust regression with high-dimensional predictors. Proc. Natl. Acad. Sci. 110(36), 14557–14562 (2013)
Article MATH Google Scholar
Fu, H., Chi, Y., Liang, Y.: Local geometry of one-hidden-layer neural networks for logistic regression (2018). arXiv preprint arXiv:1802.06463
Ge, R., Huang, F., Jin, C., Yuan, Y.: Escaping from saddle points online stochastic gradient for tensor decomposition. In: Conference on Learning Theory, pp. 797–842 (2015)
Gao, B., Xu, Z.: Phase retrieval using Gauss–Newton method (2016). arXiv preprint arXiv:1606.08135
Huang, W., Hand, P.: Blind deconvolution by a steepest descent algorithm on a quotient manifold (2017). arXiv preprint arXiv:1710.03309
Hao, B., Zhang, A., Cheng, G.: Sparse and low-rank tensor estimation via cubic sketchings (2018). arXiv preprint arXiv:1801.09326
Jin, C., Ge, R., Netrapalli, P., Kakade, S.M., Jordan, M.I.: How to escape saddle points efficiently (2017). arXiv preprint arXiv:1703.00887
Jin, C., Netrapalli, P., Jordan, M.I.: Accelerated gradient descent escapes saddle points faster than gradient descent (2017). arXiv preprint arXiv:1711.10456
Keshavan, R.H., Montanari, A., Oh, S.: Matrix completion from a few entries. IEEE Trans. Inf. Theory 56(6), 2980–2998 (2010)
Article MathSciNet MATH Google Scholar
Kueng, R., Rauhut, H., Terstiege, U.: Low rank matrix recovery from rank one measurements. Appl. Comput. Harmonic Anal. 42(1), 88–116 (2017)
Article MathSciNet MATH Google Scholar
Lang, S.: Real and Functional Analysis, vol. 10, pp. 11–13. Springer, New York (1993)
Book MATH Google Scholar
Li, G., Gu, Y., Lu, Y.M.: Phase retrieval using iterative projections: Dynamics in the large systems limit. In: Allerton Conference on Communication, Control, and Computing, pp. 1114–1118. IEEE (2015)
Lu, Y.M., Li, G.: Phase transitions of spectral initialization for high-dimensional nonconvex estimation (2017). arXiv preprint arXiv:1702.06435
Li, X., Ling, S., Strohmer, T., Wei, K.: Rapid, robust, and reliable blind deconvolution via nonconvex optimization (2016). arXiv preprint arXiv:1606.04933
Li, Y., Ma, C., Chen, Y., Chi, Y.: Nonconvex matrix factorization from rank-one measurements (2018). arXiv preprint arXiv:1802.06286
Li, Y., Ma, T., Zhang, H.: Algorithmic regularization in over-parameterized matrix recovery (2017). arXiv preprint arXiv:1712.09203
Lee, J.D., Simchowitz, M., Jordan, M.I., Recht, B.: Gradient descent converges to minimizers (2016). arXiv preprint arXiv:1602.04915
Mondelli, M., Montanari, A.: Fundamental limits of weak recovery with applications to phase retrieval (2017). arXiv preprint arXiv:1708.05932
Murray, R., Swenson, B., Kar, S.: Revisiting normalized gradient descent: evasion of saddle points (2017). arXiv preprint arXiv:1711.05224
Ma, C., Wang, K., Chi, Y., Chen, Y.: Implicit regularization in nonconvex statistical estimation: gradient descent converges linearly for phase retrieval, matrix completion and blind deconvolution (2017). arXiv preprint arXiv:1711.10467
Ma, J., Xu, J., Maleki, A.: Optimization-based AMP for phase retrieval: the impact of initialization and \(\ell _2\)-regularization (2018). arXiv preprint arXiv:1801.01170
Netrapalli, P., Jain, P., Sanghavi, S.: Phase retrieval using alternating minimization. In: Advances in Neural Information Processing Systems, pp. 2796–2804 (2013)
Nesterov, Y., Polyak, B.T.: Cubic regularization of Newton method and its global performance. Math. Program. 108(1), 177–205 (2006)
Article MathSciNet MATH Google Scholar
Qu, Q., Zhang, Y., Eldar, Y.C., Wright, J.: Convolutional phase retrieval via gradient descent (2017). arXiv preprint arXiv:1712.00716
Sur, P., Chen, Y., Candès, E.J.: The likelihood ratio test in high-dimensional logistic regression is asymptotically a rescaled chi-square. Probab. Theory Rel. Fields (to accepted) (2018)
Shechtman, Y., Eldar, Y.C., Cohen, O., Chapman, H.N., Miao, J., Segev, M.: Phase retrieval with application to optical imaging: a contemporary overview. IEEE Signal Process. Mag. 32(3), 87–109 (2015)
Article Google Scholar
Soltanolkotabi, M., Javanmard, A., Lee, J.D.: Theoretical insights into the optimization landscape of over-parameterized shallow neural networks. arXiv preprint arXiv:1707.04926 (2017)
Sun, R., Luo, Z.-Q.: Guaranteed matrix completion via non-convex factorization. IEEE Trans. Inf. Theory 62(11), 6535–6579 (2016)
Article MathSciNet MATH Google Scholar
Soltanolkotabi, M.: Algorithms and Theory for Clustering and Nonconvex Quadratic Programming. PhD thesis, Stanford University (2014)
Soltanolkotabi, M.: Structured signal recovery from quadratic measurements: breaking sample complexity barriers via nonconvex optimization (2017). arXiv preprint arXiv:1702.06175
Sun, J., Qu, Q., Wright, J.: A geometric analysis of phase retrieval. In: 2016 IEEE International Symposium on Information Theory (ISIT), pp. 2379–2383. IEEE (2016)
Schudy, W., Sviridenko, M.: Concentration and moment inequalities for polynomials of independent random variables. In: Proceedings of the Twenty-Third Annual ACM–SIAM Symposium on Discrete Algorithms, pp. 437–446. ACM, New York (2012)
Tu, S., Boczar, R., Simchowitz, M., Soltanolkotabi, M., Recht, B.: Low-rank solutions of linear matrix equations via procrustes flow. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning, Vol. 48, pp. 964–973. JMLR. org (2016)
Tan, Y.S., Vershynin, R.: Phase retrieval via randomized Kaczmarz: theoretical guarantees (2017). arXiv preprint arXiv:1706.09993
Vershynin, R.: Introduction to the non-asymptotic analysis of random matrices (2010). arXiv preprint arXiv:1011.3027
Wei, K.: Solving systems of phaseless equations via Kaczmarz methods: a proof of concept study. Inverse Probl. 31(12), 125008 (2015)
Article MathSciNet MATH Google Scholar
Wang, G., Giannakis, G.B., Eldar, Y.C.: Solving systems of random quadratic equations via truncated amplitude flow. IEEE Trans. Inf. Theory 64(2), 773–794 (2018)
Article MathSciNet MATH Google Scholar
Wang, G., Giannakis, G.B., Saad, Y., Chen, J.: Solving almost all systems of random quadratic equations (2017). arXiv preprint arXiv:1705.10407
Yang, Z., Yang, L.F., Fang, E.X., Zhao, T., Wang, Z., Neykov, M.: Misspecified nonconvex statistical optimization for phase retrieval (2017). arXiv preprint arXiv:1712.06245
Zhong, Y., Boumal, N.: Near-optimal bounds for phase synchronization (2017). arXiv preprint arXiv:1703.06605
Zhang, H., Chi, Y., Liang, Y.: Provable non-convex phase retrieval with outliers: median truncated Wirtinger flow. In: International Conference on Machine Learning, pp. 1022–1031 (2016)
Zhang, T.: Phase retrieval using alternating minimization in a batch setting (2017). arXiv preprint arXiv:1706.08167
Zheng, Q., Lafferty, J.: Convergence analysis for rectangular matrix completion using Burer–Monteiro factorization and gradient descent (2016). arXiv preprint arXiv:1605.07051
Zhang, L., Wang, G., Giannakis, G.B., Chen, J.: Compressive phase retrieval via reweighted amplitude flow (2017). arXiv preprint arXiv:1712.02426
Zhao, T., Wang, Z., Liu, H.: A nonconvex optimization framework for low rank matrix estimation. In: Advances in Neural Information Processing Systems, pp. 559–567 (2015)
Zhang, H., Zhou, Y., Liang, Y., Chi, Y.: A nonconvex approach for phase retrieval: reshaped Wirtinger flow and incremental algorithms. J. Mach. Learn. Res. 18(1), 5164–5198 (2017)
MathSciNet MATH Google Scholar

Download references

Acknowledgements

Y. Chen is supported in part by the AFOSR YIP award FA9550-19-1-0030, by the ARO grant W911NF-18-1-0303, by the ONR grant N00014-19-1-2120, and by the Princeton SEAS innovation award. Y. Chi is supported in part by AFOSR under the grant FA9550-15-1-0205, by ONR under the grant N00014-18-1-2142, by ARO under the grant W911NF-18-1-0303, and by NSF under the grants CAREER ECCS-1818571 and CCF-1806154. J. Fan is supported in part by NSF grants DMS-1662139 and DMS-1712591 and NIH grant 2R01-GM072611-13.

Author information

Authors and Affiliations

Department of Electrical Engineering, Princeton University, Princeton, NJ, 08544, USA
Yuxin Chen
Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA, 15213, USA
Yuejie Chi
Department of Operations Research and Financial Engineering, Princeton University, Princeton, NJ, 08544, USA
Jianqing Fan & Cong Ma

Authors

Yuxin Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yuejie Chi
View author publications
You can also search for this author in PubMed Google Scholar
Jianqing Fan
View author publications
You can also search for this author in PubMed Google Scholar
Cong Ma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cong Ma.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 746 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, Y., Chi, Y., Fan, J. et al. Gradient descent with random initialization: fast global convergence for nonconvex phase retrieval. Math. Program. 176, 5–37 (2019). https://doi.org/10.1007/s10107-019-01363-6

Download citation

Received: 25 March 2018
Accepted: 02 January 2019
Published: 04 February 2019
Issue Date: 01 July 2019
DOI: https://doi.org/10.1007/s10107-019-01363-6

Mathematics Subject Classification

90C26

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Gradient descent with random initialization: fast global convergence for nonconvex phase retrieval

Abstract

Access this article

Similar content being viewed by others

Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval, Matrix Completion, and Blind Deconvolution

A conjugate gradient-based algorithm for large-scale quadratic programming problem with one quadratic constraint

Stochastic algorithms with geometric step decay converge linearly on sharp functions

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Electronic supplementary material

Supplementary material 1 (pdf 746 KB)

Rights and permissions

About this article

Cite this article

Mathematics Subject Classification

Navigation

Gradient descent with random initialization: fast global convergence for nonconvex phase retrieval

Abstract

Access this article

Similar content being viewed by others

Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval, Matrix Completion, and Blind Deconvolution

A conjugate gradient-based algorithm for large-scale quadratic programming problem with one quadratic constraint

Stochastic algorithms with geometric step decay converge linearly on sharp functions

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Electronic supplementary material

Supplementary material 1 (pdf 746 KB)

Rights and permissions

About this article

Cite this article

Share this article

Mathematics Subject Classification

Search

Navigation