Abstract
This paper considers the problem of solving systems of quadratic equations, namely, recovering an object of interest \(\varvec{x}^{\natural }\in {\mathbb {R}}^{n}\) from m quadratic equations/samples \(y_{i}=(\varvec{a}_{i}^{\top }\varvec{x}^{\natural })^{2}, 1\le i\le m\). This problem, also dubbed as phase retrieval, spans multiple domains including physical sciences and machine learning. We investigate the efficacy of gradient descent (or Wirtinger flow) designed for the nonconvex least squares problem. We prove that under Gaussian designs, gradient descent—when randomly initialized—yields an \(\epsilon \)-accurate solution in \(O\big (\log n+\log (1/\epsilon )\big )\) iterations given nearly minimal samples, thus achieving near-optimal computational and sample complexities at once. This provides the first global convergence guarantee concerning vanilla gradient descent for phase retrieval, without the need of (i) carefully-designed initialization, (ii) sample splitting, or (iii) sophisticated saddle-point escaping schemes. All of these are achieved by exploiting the statistical models in analyzing optimization algorithms, via a leave-one-out approach that enables the decoupling of certain statistical dependency between the gradient descent iterates and the data.
Similar content being viewed by others
Notes
An iterative algorithm is said to enjoy linear convergence if the iterates \(\{\varvec{x}^{t}\}\) converge geometrically fast to the minimizer \(\varvec{x}^{\natural }\).
Here, we do not take the absolute value of \(x_{\parallel }^{t}\). As we shall see later, the \(x_{\parallel }^{t}\)’s are of the same sign throughout the execution of the algorithm.
More specifically, the GD update \(\varvec{x}^{t+1}=\varvec{x}^{t}-m^{-1}{\eta _{t}}\sum _{i=1}^{m}\big [(\varvec{a}_{i}^{\top }\varvec{x}^{t})^{2}-y_{i}\big ]\varvec{a}_{i}\varvec{a}_{i}^{\top }\varvec{x}_{t}\approx (\varvec{I}+m^{-1}{\eta _{t}}\sum _{i=1}^{m}y_{i}\varvec{a}_{i}\varvec{a}_{i}^{\top })\varvec{x}_{t}\) when \(\varvec{x}_t\approx \varvec{0}\), which is equivalent to a power iteration (without normalization) w.r.t. the data matrix \(\varvec{I}+m^{-1}{\eta _{t}}\sum _{i=1}^{m}y_{i}\varvec{a}_{i}\varvec{a}_{i}^{\top }\).
When applied to phase retrieval with \(m\asymp n\,\mathrm {poly}\log n\), one has \(L\asymp n\), \(\rho \asymp n\), \(\theta \asymp \gamma \asymp 1\) (see [59, Theorem 2.2]), \(\alpha \asymp 1\), and \(\beta \gtrsim n\) (ignoring logarithmic factors).
This is because of the rotational invariance of Gaussian distributions.
References
Agarwal, N., Allen-Zhu, Z., Bullins, B., Hazan, E., Ma, T.: Finding approximate local minima for nonconvex optimization in linear time (2016). arXiv preprint arXiv:1611.01146
Abbe, E., Fan, J., Wang, K., Zhong, Y.: Entrywise eigenvector analysis of random matrices with low expected rank (2017). arXiv preprint arXiv:1709.09565
Allen-Zhu, Z.: Natasha 2: faster non-convex optimization than SGD (2017). arXiv preprint arXiv:1708.08694
Bandeira, A.S., Cahill, J., Mixon, D.G., Nelson, A.A.: Saving phase: injectivity and stability for phase retrieval. Appl. Comput. Harmonic Anal. 37(1), 106–125 (2014)
Bendory, T., Eldar, Y.C., Boumal, N.: Non-convex phase retrieval from STFT measurements. IEEE Trans. Inf. Theory 64(1), 467–484 (2018)
Chen, Y., Candès, E.J.: Solving random quadratic systems of equations is nearly as easy as solving linear systems. Commun. Pure Appl. Math. 70(5), 822–883 (2017)
Chen, Y., Candès, E.: The projected power method: an efficient algorithm for joint alignment from pairwise differences. Commun. Pure Appl. Math. 71(8), 1648–1714 (2018)
Chen, Y., Cheng, C., Fan, J.: Asymmetry helps: eigenvalue and eigenvector analyses of asymmetrically perturbed low-rank matrices (2018). arXiv preprint arXiv:1811.12804
Chen, Y., Chi, Y., Goldsmith, A.J.: Exact and stable covariance estimation from quadratic sampling via convex programming. IEEE Trans. Inf. Theory 61(7), 4034–4059 (2015)
Candès, E.J., Eldar, Y.C., Strohmer, T., Voroninski, V.: Phase retrieval via matrix completion. SIAM J. Imaging Sci. 6(1), 199–225 (2013)
Chen, P., Fannjiang, A., Liu, G.-R.: Phase retrieval with one or two diffraction patterns by alternating projections with the null initialization. J. Fourier Anal. Appl. 24(3), 719–758 (2018)
Chen, Y., Fan, J., Ma, C., Wang, K.: Spectral method and regularized MLE are both optimal for top-\(K\) ranking (2017). arXiv preprint arXiv:1707.09971
Candès, E.J., Li, X.: Solving quadratic equations via PhaseLift when there are about as many equations as unknowns. Found. Comput. Math. 14(5), 1017–1026 (2014)
Chi, Y., Lu, Y.M.: Kaczmarz method for solving quadratic equations. IEEE Signal Process. Lett. 23(9), 1183–1187 (2016)
Chen, J., Li, X.: Memory-efficient kernel PCA via partial matrix sampling and nonconvex optimization: a model-free analysis of local minima (2017). arXiv preprint arXiv:1711.01742
Chi, Y., Lu, Y.M., Chen, Y.: Nonconvex optimization meets low-rank matrix factorization: an overview (2018). arXiv preprint arXiv:1809.09573
Cai, T.T., Li, X., Ma, Z.: Optimal rates of convergence for noisy sparse phase retrieval via thresholded Wirtinger flow. Ann. Stat. 44(5), 2221–2251 (2016)
Candès, E.J., Li, X., Soltanolkotabi, M.: Phase retrieval via Wirtinger flow: theory and algorithms. IEEE Trans. Inf. Theory 61(4), 1985–2007 (2015)
Cai, J.-F., Liu, H., Wang, Y.: Fast rank one alternating minimization algorithm for phase retrieval (2017). arXiv preprint arXiv:1708.08751
Candès, E.J., Strohmer, T., Voroninski, V.: Phaselift: exact and stable signal recovery from magnitude measurements via convex programming. Commun. Pure Appl. Math. 66(8), 1017–1026 (2013)
Chen, Y., Wainwright, M.J.: Fast low-rank estimation by projected gradient descent: general statistical and algorithmic guarantees (2015). arXiv preprint arXiv:1509.03025
Chen, J., Wang, L., Zhang, X., Gu, Q.: Robust Wirtinger flow for phase retrieval with arbitrary corruption (2017). arXiv preprint arXiv:1704.06256
Chen, Y., Yi, X., Caramanis, C.: A convex formulation for mixed regression with two components: minimax optimal rates. In: Conference on Learning Theory, pp. 560–604 (2014)
Cai, T., Zhang, A.: ROP: matrix recovery via rank-one projections. Ann. Stat. 43(1), 102–138 (2015)
Demanet, L., Hand, P.: Stable optimizationless recovery from phaseless linear measurements. J. Fourier Anal. Appl. 20(1), 199–221 (2014)
Du, S.S., Jin, C., Lee, J.D., Jordan, M.I., Singh, A., Poczos, B.: Gradient descent can take exponential time to escape saddle points. In: Advances in Neural Information Processing Systems, pp. 1067–1077 (2017)
Duchi, J.C., Ruan, F.: Solving (most) of a set of quadratic equalities: composite optimization for robust phase retrieval (2017). arXiv preprint arXiv:1705.02356
El Karoui, N.: On the impact of predictor geometry on the performance on high-dimensional ridge-regularized generalized robust regression estimators. Probab. Theory Rel. Fields 170(1–2), 95–175 (2018)
El Karoui, N., Bean, D., Bickel, P.J., Lim, C., Yu, B.: On robust regression with high-dimensional predictors. Proc. Natl. Acad. Sci. 110(36), 14557–14562 (2013)
Fu, H., Chi, Y., Liang, Y.: Local geometry of one-hidden-layer neural networks for logistic regression (2018). arXiv preprint arXiv:1802.06463
Ge, R., Huang, F., Jin, C., Yuan, Y.: Escaping from saddle points online stochastic gradient for tensor decomposition. In: Conference on Learning Theory, pp. 797–842 (2015)
Gao, B., Xu, Z.: Phase retrieval using Gauss–Newton method (2016). arXiv preprint arXiv:1606.08135
Huang, W., Hand, P.: Blind deconvolution by a steepest descent algorithm on a quotient manifold (2017). arXiv preprint arXiv:1710.03309
Hao, B., Zhang, A., Cheng, G.: Sparse and low-rank tensor estimation via cubic sketchings (2018). arXiv preprint arXiv:1801.09326
Jin, C., Ge, R., Netrapalli, P., Kakade, S.M., Jordan, M.I.: How to escape saddle points efficiently (2017). arXiv preprint arXiv:1703.00887
Jin, C., Netrapalli, P., Jordan, M.I.: Accelerated gradient descent escapes saddle points faster than gradient descent (2017). arXiv preprint arXiv:1711.10456
Keshavan, R.H., Montanari, A., Oh, S.: Matrix completion from a few entries. IEEE Trans. Inf. Theory 56(6), 2980–2998 (2010)
Kueng, R., Rauhut, H., Terstiege, U.: Low rank matrix recovery from rank one measurements. Appl. Comput. Harmonic Anal. 42(1), 88–116 (2017)
Lang, S.: Real and Functional Analysis, vol. 10, pp. 11–13. Springer, New York (1993)
Li, G., Gu, Y., Lu, Y.M.: Phase retrieval using iterative projections: Dynamics in the large systems limit. In: Allerton Conference on Communication, Control, and Computing, pp. 1114–1118. IEEE (2015)
Lu, Y.M., Li, G.: Phase transitions of spectral initialization for high-dimensional nonconvex estimation (2017). arXiv preprint arXiv:1702.06435
Li, X., Ling, S., Strohmer, T., Wei, K.: Rapid, robust, and reliable blind deconvolution via nonconvex optimization (2016). arXiv preprint arXiv:1606.04933
Li, Y., Ma, C., Chen, Y., Chi, Y.: Nonconvex matrix factorization from rank-one measurements (2018). arXiv preprint arXiv:1802.06286
Li, Y., Ma, T., Zhang, H.: Algorithmic regularization in over-parameterized matrix recovery (2017). arXiv preprint arXiv:1712.09203
Lee, J.D., Simchowitz, M., Jordan, M.I., Recht, B.: Gradient descent converges to minimizers (2016). arXiv preprint arXiv:1602.04915
Mondelli, M., Montanari, A.: Fundamental limits of weak recovery with applications to phase retrieval (2017). arXiv preprint arXiv:1708.05932
Murray, R., Swenson, B., Kar, S.: Revisiting normalized gradient descent: evasion of saddle points (2017). arXiv preprint arXiv:1711.05224
Ma, C., Wang, K., Chi, Y., Chen, Y.: Implicit regularization in nonconvex statistical estimation: gradient descent converges linearly for phase retrieval, matrix completion and blind deconvolution (2017). arXiv preprint arXiv:1711.10467
Ma, J., Xu, J., Maleki, A.: Optimization-based AMP for phase retrieval: the impact of initialization and \(\ell _2\)-regularization (2018). arXiv preprint arXiv:1801.01170
Netrapalli, P., Jain, P., Sanghavi, S.: Phase retrieval using alternating minimization. In: Advances in Neural Information Processing Systems, pp. 2796–2804 (2013)
Nesterov, Y., Polyak, B.T.: Cubic regularization of Newton method and its global performance. Math. Program. 108(1), 177–205 (2006)
Qu, Q., Zhang, Y., Eldar, Y.C., Wright, J.: Convolutional phase retrieval via gradient descent (2017). arXiv preprint arXiv:1712.00716
Sur, P., Chen, Y., Candès, E.J.: The likelihood ratio test in high-dimensional logistic regression is asymptotically a rescaled chi-square. Probab. Theory Rel. Fields (to accepted) (2018)
Shechtman, Y., Eldar, Y.C., Cohen, O., Chapman, H.N., Miao, J., Segev, M.: Phase retrieval with application to optical imaging: a contemporary overview. IEEE Signal Process. Mag. 32(3), 87–109 (2015)
Soltanolkotabi, M., Javanmard, A., Lee, J.D.: Theoretical insights into the optimization landscape of over-parameterized shallow neural networks. arXiv preprint arXiv:1707.04926 (2017)
Sun, R., Luo, Z.-Q.: Guaranteed matrix completion via non-convex factorization. IEEE Trans. Inf. Theory 62(11), 6535–6579 (2016)
Soltanolkotabi, M.: Algorithms and Theory for Clustering and Nonconvex Quadratic Programming. PhD thesis, Stanford University (2014)
Soltanolkotabi, M.: Structured signal recovery from quadratic measurements: breaking sample complexity barriers via nonconvex optimization (2017). arXiv preprint arXiv:1702.06175
Sun, J., Qu, Q., Wright, J.: A geometric analysis of phase retrieval. In: 2016 IEEE International Symposium on Information Theory (ISIT), pp. 2379–2383. IEEE (2016)
Schudy, W., Sviridenko, M.: Concentration and moment inequalities for polynomials of independent random variables. In: Proceedings of the Twenty-Third Annual ACM–SIAM Symposium on Discrete Algorithms, pp. 437–446. ACM, New York (2012)
Tu, S., Boczar, R., Simchowitz, M., Soltanolkotabi, M., Recht, B.: Low-rank solutions of linear matrix equations via procrustes flow. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning, Vol. 48, pp. 964–973. JMLR. org (2016)
Tan, Y.S., Vershynin, R.: Phase retrieval via randomized Kaczmarz: theoretical guarantees (2017). arXiv preprint arXiv:1706.09993
Vershynin, R.: Introduction to the non-asymptotic analysis of random matrices (2010). arXiv preprint arXiv:1011.3027
Wei, K.: Solving systems of phaseless equations via Kaczmarz methods: a proof of concept study. Inverse Probl. 31(12), 125008 (2015)
Wang, G., Giannakis, G.B., Eldar, Y.C.: Solving systems of random quadratic equations via truncated amplitude flow. IEEE Trans. Inf. Theory 64(2), 773–794 (2018)
Wang, G., Giannakis, G.B., Saad, Y., Chen, J.: Solving almost all systems of random quadratic equations (2017). arXiv preprint arXiv:1705.10407
Yang, Z., Yang, L.F., Fang, E.X., Zhao, T., Wang, Z., Neykov, M.: Misspecified nonconvex statistical optimization for phase retrieval (2017). arXiv preprint arXiv:1712.06245
Zhong, Y., Boumal, N.: Near-optimal bounds for phase synchronization (2017). arXiv preprint arXiv:1703.06605
Zhang, H., Chi, Y., Liang, Y.: Provable non-convex phase retrieval with outliers: median truncated Wirtinger flow. In: International Conference on Machine Learning, pp. 1022–1031 (2016)
Zhang, T.: Phase retrieval using alternating minimization in a batch setting (2017). arXiv preprint arXiv:1706.08167
Zheng, Q., Lafferty, J.: Convergence analysis for rectangular matrix completion using Burer–Monteiro factorization and gradient descent (2016). arXiv preprint arXiv:1605.07051
Zhang, L., Wang, G., Giannakis, G.B., Chen, J.: Compressive phase retrieval via reweighted amplitude flow (2017). arXiv preprint arXiv:1712.02426
Zhao, T., Wang, Z., Liu, H.: A nonconvex optimization framework for low rank matrix estimation. In: Advances in Neural Information Processing Systems, pp. 559–567 (2015)
Zhang, H., Zhou, Y., Liang, Y., Chi, Y.: A nonconvex approach for phase retrieval: reshaped Wirtinger flow and incremental algorithms. J. Mach. Learn. Res. 18(1), 5164–5198 (2017)
Acknowledgements
Y. Chen is supported in part by the AFOSR YIP award FA9550-19-1-0030, by the ARO grant W911NF-18-1-0303, by the ONR grant N00014-19-1-2120, and by the Princeton SEAS innovation award. Y. Chi is supported in part by AFOSR under the grant FA9550-15-1-0205, by ONR under the grant N00014-18-1-2142, by ARO under the grant W911NF-18-1-0303, and by NSF under the grants CAREER ECCS-1818571 and CCF-1806154. J. Fan is supported in part by NSF grants DMS-1662139 and DMS-1712591 and NIH grant 2R01-GM072611-13.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Chen, Y., Chi, Y., Fan, J. et al. Gradient descent with random initialization: fast global convergence for nonconvex phase retrieval. Math. Program. 176, 5–37 (2019). https://doi.org/10.1007/s10107-019-01363-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-019-01363-6