Skip to main content
Log in

Gradient descent with random initialization: fast global convergence for nonconvex phase retrieval

  • Full Length Paper
  • Series B
  • Published:
Mathematical Programming Submit manuscript

Abstract

This paper considers the problem of solving systems of quadratic equations, namely, recovering an object of interest \(\varvec{x}^{\natural }\in {\mathbb {R}}^{n}\) from m quadratic equations/samples \(y_{i}=(\varvec{a}_{i}^{\top }\varvec{x}^{\natural })^{2}, 1\le i\le m\). This problem, also dubbed as phase retrieval, spans multiple domains including physical sciences and machine learning. We investigate the efficacy of gradient descent (or Wirtinger flow) designed for the nonconvex least squares problem. We prove that under Gaussian designs, gradient descent—when randomly initialized—yields an \(\epsilon \)-accurate solution in \(O\big (\log n+\log (1/\epsilon )\big )\) iterations given nearly minimal samples, thus achieving near-optimal computational and sample complexities at once. This provides the first global convergence guarantee concerning vanilla gradient descent for phase retrieval, without the need of (i) carefully-designed initialization, (ii) sample splitting, or (iii) sophisticated saddle-point escaping schemes. All of these are achieved by exploiting the statistical models in analyzing optimization algorithms, via a leave-one-out approach that enables the decoupling of certain statistical dependency between the gradient descent iterates and the data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. An iterative algorithm is said to enjoy linear convergence if the iterates \(\{\varvec{x}^{t}\}\) converge geometrically fast to the minimizer \(\varvec{x}^{\natural }\).

  2. Here, we do not take the absolute value of \(x_{\parallel }^{t}\). As we shall see later, the \(x_{\parallel }^{t}\)’s are of the same sign throughout the execution of the algorithm.

  3. More specifically, the GD update \(\varvec{x}^{t+1}=\varvec{x}^{t}-m^{-1}{\eta _{t}}\sum _{i=1}^{m}\big [(\varvec{a}_{i}^{\top }\varvec{x}^{t})^{2}-y_{i}\big ]\varvec{a}_{i}\varvec{a}_{i}^{\top }\varvec{x}_{t}\approx (\varvec{I}+m^{-1}{\eta _{t}}\sum _{i=1}^{m}y_{i}\varvec{a}_{i}\varvec{a}_{i}^{\top })\varvec{x}_{t}\) when \(\varvec{x}_t\approx \varvec{0}\), which is equivalent to a power iteration (without normalization) w.r.t. the data matrix \(\varvec{I}+m^{-1}{\eta _{t}}\sum _{i=1}^{m}y_{i}\varvec{a}_{i}\varvec{a}_{i}^{\top }\).

  4. When applied to phase retrieval with \(m\asymp n\,\mathrm {poly}\log n\), one has \(L\asymp n\), \(\rho \asymp n\), \(\theta \asymp \gamma \asymp 1\) (see [59, Theorem 2.2]), \(\alpha \asymp 1\), and \(\beta \gtrsim n\) (ignoring logarithmic factors).

  5. This is because of the rotational invariance of Gaussian distributions.

References

  1. Agarwal, N., Allen-Zhu, Z., Bullins, B., Hazan, E., Ma, T.: Finding approximate local minima for nonconvex optimization in linear time (2016). arXiv preprint arXiv:1611.01146

  2. Abbe, E., Fan, J., Wang, K., Zhong, Y.: Entrywise eigenvector analysis of random matrices with low expected rank (2017). arXiv preprint arXiv:1709.09565

  3. Allen-Zhu, Z.: Natasha 2: faster non-convex optimization than SGD (2017). arXiv preprint arXiv:1708.08694

  4. Bandeira, A.S., Cahill, J., Mixon, D.G., Nelson, A.A.: Saving phase: injectivity and stability for phase retrieval. Appl. Comput. Harmonic Anal. 37(1), 106–125 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  5. Bendory, T., Eldar, Y.C., Boumal, N.: Non-convex phase retrieval from STFT measurements. IEEE Trans. Inf. Theory 64(1), 467–484 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  6. Chen, Y., Candès, E.J.: Solving random quadratic systems of equations is nearly as easy as solving linear systems. Commun. Pure Appl. Math. 70(5), 822–883 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  7. Chen, Y., Candès, E.: The projected power method: an efficient algorithm for joint alignment from pairwise differences. Commun. Pure Appl. Math. 71(8), 1648–1714 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  8. Chen, Y., Cheng, C., Fan, J.: Asymmetry helps: eigenvalue and eigenvector analyses of asymmetrically perturbed low-rank matrices (2018). arXiv preprint arXiv:1811.12804

  9. Chen, Y., Chi, Y., Goldsmith, A.J.: Exact and stable covariance estimation from quadratic sampling via convex programming. IEEE Trans. Inf. Theory 61(7), 4034–4059 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  10. Candès, E.J., Eldar, Y.C., Strohmer, T., Voroninski, V.: Phase retrieval via matrix completion. SIAM J. Imaging Sci. 6(1), 199–225 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  11. Chen, P., Fannjiang, A., Liu, G.-R.: Phase retrieval with one or two diffraction patterns by alternating projections with the null initialization. J. Fourier Anal. Appl. 24(3), 719–758 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  12. Chen, Y., Fan, J., Ma, C., Wang, K.: Spectral method and regularized MLE are both optimal for top-\(K\) ranking (2017). arXiv preprint arXiv:1707.09971

  13. Candès, E.J., Li, X.: Solving quadratic equations via PhaseLift when there are about as many equations as unknowns. Found. Comput. Math. 14(5), 1017–1026 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  14. Chi, Y., Lu, Y.M.: Kaczmarz method for solving quadratic equations. IEEE Signal Process. Lett. 23(9), 1183–1187 (2016)

    Article  Google Scholar 

  15. Chen, J., Li, X.: Memory-efficient kernel PCA via partial matrix sampling and nonconvex optimization: a model-free analysis of local minima (2017). arXiv preprint arXiv:1711.01742

  16. Chi, Y., Lu, Y.M., Chen, Y.: Nonconvex optimization meets low-rank matrix factorization: an overview (2018). arXiv preprint arXiv:1809.09573

  17. Cai, T.T., Li, X., Ma, Z.: Optimal rates of convergence for noisy sparse phase retrieval via thresholded Wirtinger flow. Ann. Stat. 44(5), 2221–2251 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  18. Candès, E.J., Li, X., Soltanolkotabi, M.: Phase retrieval via Wirtinger flow: theory and algorithms. IEEE Trans. Inf. Theory 61(4), 1985–2007 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  19. Cai, J.-F., Liu, H., Wang, Y.: Fast rank one alternating minimization algorithm for phase retrieval (2017). arXiv preprint arXiv:1708.08751

  20. Candès, E.J., Strohmer, T., Voroninski, V.: Phaselift: exact and stable signal recovery from magnitude measurements via convex programming. Commun. Pure Appl. Math. 66(8), 1017–1026 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  21. Chen, Y., Wainwright, M.J.: Fast low-rank estimation by projected gradient descent: general statistical and algorithmic guarantees (2015). arXiv preprint arXiv:1509.03025

  22. Chen, J., Wang, L., Zhang, X., Gu, Q.: Robust Wirtinger flow for phase retrieval with arbitrary corruption (2017). arXiv preprint arXiv:1704.06256

  23. Chen, Y., Yi, X., Caramanis, C.: A convex formulation for mixed regression with two components: minimax optimal rates. In: Conference on Learning Theory, pp. 560–604 (2014)

  24. Cai, T., Zhang, A.: ROP: matrix recovery via rank-one projections. Ann. Stat. 43(1), 102–138 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  25. Demanet, L., Hand, P.: Stable optimizationless recovery from phaseless linear measurements. J. Fourier Anal. Appl. 20(1), 199–221 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  26. Du, S.S., Jin, C., Lee, J.D., Jordan, M.I., Singh, A., Poczos, B.: Gradient descent can take exponential time to escape saddle points. In: Advances in Neural Information Processing Systems, pp. 1067–1077 (2017)

  27. Duchi, J.C., Ruan, F.: Solving (most) of a set of quadratic equalities: composite optimization for robust phase retrieval (2017). arXiv preprint arXiv:1705.02356

  28. El Karoui, N.: On the impact of predictor geometry on the performance on high-dimensional ridge-regularized generalized robust regression estimators. Probab. Theory Rel. Fields 170(1–2), 95–175 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  29. El Karoui, N., Bean, D., Bickel, P.J., Lim, C., Yu, B.: On robust regression with high-dimensional predictors. Proc. Natl. Acad. Sci. 110(36), 14557–14562 (2013)

    Article  MATH  Google Scholar 

  30. Fu, H., Chi, Y., Liang, Y.: Local geometry of one-hidden-layer neural networks for logistic regression (2018). arXiv preprint arXiv:1802.06463

  31. Ge, R., Huang, F., Jin, C., Yuan, Y.: Escaping from saddle points online stochastic gradient for tensor decomposition. In: Conference on Learning Theory, pp. 797–842 (2015)

  32. Gao, B., Xu, Z.: Phase retrieval using Gauss–Newton method (2016). arXiv preprint arXiv:1606.08135

  33. Huang, W., Hand, P.: Blind deconvolution by a steepest descent algorithm on a quotient manifold (2017). arXiv preprint arXiv:1710.03309

  34. Hao, B., Zhang, A., Cheng, G.: Sparse and low-rank tensor estimation via cubic sketchings (2018). arXiv preprint arXiv:1801.09326

  35. Jin, C., Ge, R., Netrapalli, P., Kakade, S.M., Jordan, M.I.: How to escape saddle points efficiently (2017). arXiv preprint arXiv:1703.00887

  36. Jin, C., Netrapalli, P., Jordan, M.I.: Accelerated gradient descent escapes saddle points faster than gradient descent (2017). arXiv preprint arXiv:1711.10456

  37. Keshavan, R.H., Montanari, A., Oh, S.: Matrix completion from a few entries. IEEE Trans. Inf. Theory 56(6), 2980–2998 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  38. Kueng, R., Rauhut, H., Terstiege, U.: Low rank matrix recovery from rank one measurements. Appl. Comput. Harmonic Anal. 42(1), 88–116 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  39. Lang, S.: Real and Functional Analysis, vol. 10, pp. 11–13. Springer, New York (1993)

    Book  MATH  Google Scholar 

  40. Li, G., Gu, Y., Lu, Y.M.: Phase retrieval using iterative projections: Dynamics in the large systems limit. In: Allerton Conference on Communication, Control, and Computing, pp. 1114–1118. IEEE (2015)

  41. Lu, Y.M., Li, G.: Phase transitions of spectral initialization for high-dimensional nonconvex estimation (2017). arXiv preprint arXiv:1702.06435

  42. Li, X., Ling, S., Strohmer, T., Wei, K.: Rapid, robust, and reliable blind deconvolution via nonconvex optimization (2016). arXiv preprint arXiv:1606.04933

  43. Li, Y., Ma, C., Chen, Y., Chi, Y.: Nonconvex matrix factorization from rank-one measurements (2018). arXiv preprint arXiv:1802.06286

  44. Li, Y., Ma, T., Zhang, H.: Algorithmic regularization in over-parameterized matrix recovery (2017). arXiv preprint arXiv:1712.09203

  45. Lee, J.D., Simchowitz, M., Jordan, M.I., Recht, B.: Gradient descent converges to minimizers (2016). arXiv preprint arXiv:1602.04915

  46. Mondelli, M., Montanari, A.: Fundamental limits of weak recovery with applications to phase retrieval (2017). arXiv preprint arXiv:1708.05932

  47. Murray, R., Swenson, B., Kar, S.: Revisiting normalized gradient descent: evasion of saddle points (2017). arXiv preprint arXiv:1711.05224

  48. Ma, C., Wang, K., Chi, Y., Chen, Y.: Implicit regularization in nonconvex statistical estimation: gradient descent converges linearly for phase retrieval, matrix completion and blind deconvolution (2017). arXiv preprint arXiv:1711.10467

  49. Ma, J., Xu, J., Maleki, A.: Optimization-based AMP for phase retrieval: the impact of initialization and \(\ell _2\)-regularization (2018). arXiv preprint arXiv:1801.01170

  50. Netrapalli, P., Jain, P., Sanghavi, S.: Phase retrieval using alternating minimization. In: Advances in Neural Information Processing Systems, pp. 2796–2804 (2013)

  51. Nesterov, Y., Polyak, B.T.: Cubic regularization of Newton method and its global performance. Math. Program. 108(1), 177–205 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  52. Qu, Q., Zhang, Y., Eldar, Y.C., Wright, J.: Convolutional phase retrieval via gradient descent (2017). arXiv preprint arXiv:1712.00716

  53. Sur, P., Chen, Y., Candès, E.J.: The likelihood ratio test in high-dimensional logistic regression is asymptotically a rescaled chi-square. Probab. Theory Rel. Fields (to accepted) (2018)

  54. Shechtman, Y., Eldar, Y.C., Cohen, O., Chapman, H.N., Miao, J., Segev, M.: Phase retrieval with application to optical imaging: a contemporary overview. IEEE Signal Process. Mag. 32(3), 87–109 (2015)

    Article  Google Scholar 

  55. Soltanolkotabi, M., Javanmard, A., Lee, J.D.: Theoretical insights into the optimization landscape of over-parameterized shallow neural networks. arXiv preprint arXiv:1707.04926 (2017)

  56. Sun, R., Luo, Z.-Q.: Guaranteed matrix completion via non-convex factorization. IEEE Trans. Inf. Theory 62(11), 6535–6579 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  57. Soltanolkotabi, M.: Algorithms and Theory for Clustering and Nonconvex Quadratic Programming. PhD thesis, Stanford University (2014)

  58. Soltanolkotabi, M.: Structured signal recovery from quadratic measurements: breaking sample complexity barriers via nonconvex optimization (2017). arXiv preprint arXiv:1702.06175

  59. Sun, J., Qu, Q., Wright, J.: A geometric analysis of phase retrieval. In: 2016 IEEE International Symposium on Information Theory (ISIT), pp. 2379–2383. IEEE (2016)

  60. Schudy, W., Sviridenko, M.: Concentration and moment inequalities for polynomials of independent random variables. In: Proceedings of the Twenty-Third Annual ACM–SIAM Symposium on Discrete Algorithms, pp. 437–446. ACM, New York (2012)

  61. Tu, S., Boczar, R., Simchowitz, M., Soltanolkotabi, M., Recht, B.: Low-rank solutions of linear matrix equations via procrustes flow. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning, Vol. 48, pp. 964–973. JMLR. org (2016)

  62. Tan, Y.S., Vershynin, R.: Phase retrieval via randomized Kaczmarz: theoretical guarantees (2017). arXiv preprint arXiv:1706.09993

  63. Vershynin, R.: Introduction to the non-asymptotic analysis of random matrices (2010). arXiv preprint arXiv:1011.3027

  64. Wei, K.: Solving systems of phaseless equations via Kaczmarz methods: a proof of concept study. Inverse Probl. 31(12), 125008 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  65. Wang, G., Giannakis, G.B., Eldar, Y.C.: Solving systems of random quadratic equations via truncated amplitude flow. IEEE Trans. Inf. Theory 64(2), 773–794 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  66. Wang, G., Giannakis, G.B., Saad, Y., Chen, J.: Solving almost all systems of random quadratic equations (2017). arXiv preprint arXiv:1705.10407

  67. Yang, Z., Yang, L.F., Fang, E.X., Zhao, T., Wang, Z., Neykov, M.: Misspecified nonconvex statistical optimization for phase retrieval (2017). arXiv preprint arXiv:1712.06245

  68. Zhong, Y., Boumal, N.: Near-optimal bounds for phase synchronization (2017). arXiv preprint arXiv:1703.06605

  69. Zhang, H., Chi, Y., Liang, Y.: Provable non-convex phase retrieval with outliers: median truncated Wirtinger flow. In: International Conference on Machine Learning, pp. 1022–1031 (2016)

  70. Zhang, T.: Phase retrieval using alternating minimization in a batch setting (2017). arXiv preprint arXiv:1706.08167

  71. Zheng, Q., Lafferty, J.: Convergence analysis for rectangular matrix completion using Burer–Monteiro factorization and gradient descent (2016). arXiv preprint arXiv:1605.07051

  72. Zhang, L., Wang, G., Giannakis, G.B., Chen, J.: Compressive phase retrieval via reweighted amplitude flow (2017). arXiv preprint arXiv:1712.02426

  73. Zhao, T., Wang, Z., Liu, H.: A nonconvex optimization framework for low rank matrix estimation. In: Advances in Neural Information Processing Systems, pp. 559–567 (2015)

  74. Zhang, H., Zhou, Y., Liang, Y., Chi, Y.: A nonconvex approach for phase retrieval: reshaped Wirtinger flow and incremental algorithms. J. Mach. Learn. Res. 18(1), 5164–5198 (2017)

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

Y. Chen is supported in part by the AFOSR YIP award FA9550-19-1-0030, by the ARO grant W911NF-18-1-0303, by the ONR grant N00014-19-1-2120, and by the Princeton SEAS innovation award. Y. Chi is supported in part by AFOSR under the grant FA9550-15-1-0205, by ONR under the grant N00014-18-1-2142, by ARO under the grant W911NF-18-1-0303, and by NSF under the grants CAREER ECCS-1818571 and CCF-1806154. J. Fan is supported in part by NSF grants DMS-1662139 and DMS-1712591 and NIH grant 2R01-GM072611-13.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cong Ma.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 746 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, Y., Chi, Y., Fan, J. et al. Gradient descent with random initialization: fast global convergence for nonconvex phase retrieval. Math. Program. 176, 5–37 (2019). https://doi.org/10.1007/s10107-019-01363-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10107-019-01363-6

Mathematics Subject Classification

Navigation