Advertisement

Median-Truncated Gradient Descent: A Robust and Scalable Nonconvex Approach for Signal Estimation

  • Yuejie ChiEmail author
  • Yuanxin Li
  • Huishuai Zhang
  • Yingbin Liang
Chapter
Part of the Applied and Numerical Harmonic Analysis book series (ANHA)

Abstract

Recent work has demonstrated the effectiveness of gradient descent for directly estimating high-dimensional signals via nonconvex optimization in a globally convergent manner using a proper initialization. However, the performance is highly sensitive in the presence of adversarial outliers that may take arbitrary values. In this chapter, we introduce the median-Truncated Gradient Descent (median-TGD) algorithm to improve the robustness of gradient descent against outliers, and apply it to two celebrated problems: low-rank matrix recovery and phase retrieval. Median-TGD truncates the contributions of samples that deviate significantly from the sample median in each iteration in order to stabilize the search direction. Encouragingly, when initialized in a neighborhood of the ground truth known as the basin of attraction, median-TGD converges to the ground truth at a linear rate under Gaussian designs with a near-optimal number of measurements, even when a constant fraction of the measurements are arbitrarily corrupted. In addition, we introduce a new median-truncated spectral method that ensures an initialization in the basin of attraction. The stability against additional dense bounded noise is also established. Numerical experiments are provided to validate the superior performance of median-TGD.

Notes

Acknowledgements

The work of Y. Chi and Y. Li is supported in part by AFOSR under the grant FA9550-15-1-0205, by ONR under the grant N00014-18-1-2142, by ARO under the grant W911NF-18-1-0303, and by NSF under the grants CAREER ECCS-1818571 and CCF-1806154. The work of Y. Liang is supported in part by NSF under the grants CCF-1761506 and ECCS-1818904.

References

  1. 1.
    P. Auer, M. Herbster, M.K. Warmuth, Exponentially many local minima for single neurons, in Advances in neural information processing systems (1996), pp. 316–322Google Scholar
  2. 2.
    R. Sun, Z.-Q. Luo, Guaranteed matrix completion via non-convex factorization. IEEE Trans. Inf. Theory 62(11), 6535–6579 (2016)MathSciNetCrossRefGoogle Scholar
  3. 3.
    E.J. Candès, X. Li, M. Soltanolkotabi, Phase retrieval via wirtinger flow: theory and algorithms. IEEE Trans. Inf. Theory 61(4), 1985–2007 (2015)MathSciNetCrossRefGoogle Scholar
  4. 4.
    J. Sun, Q. Qu, J. Wright, Complete dictionary recovery over the sphere i: overview and the geometric picture. IEEE Trans. Inf. Theory 63(2), 853–884 (2017)MathSciNetCrossRefGoogle Scholar
  5. 5.
    X. Li, S. Ling, T. Strohmer, K. Wei, Rapid, robust, and reliable blind deconvolution via nonconvex optimization. Appl. Comput. Harmon. Anal. (2018)Google Scholar
  6. 6.
    P.J. Huber, Robust Statistics (Springer, 2011)Google Scholar
  7. 7.
    Y. Li, Y. Chi, H. Zhang, Y. Liang, Non-convex low-rank matrix recovery from corrupted random linear measurements, in 2017 International Conference on Sampling Theory and Applications (SampTA) (2017)Google Scholar
  8. 8.
    Y. Li, Y. Chi, H. Zhang, Y. Liang, Nonconvex low-rank matrix recovery with arbitrary outliers via median-truncated gradient descent. arXiv:1709.08114 (2017)
  9. 9.
    H. Zhang, Y. Chi, Y. Liang, Median-truncated nonconvex approach for phase retrieval with outliers. IEEE Trans. Inf. Theory 64(11), 7287–7310 (2018)MathSciNetCrossRefGoogle Scholar
  10. 10.
    R.J. Tibshirani, Fast computation of the median by successive binning. arXiv:0806.3301 (2008)
  11. 11.
    B. Recht, M. Fazel, P.A. Parrilo, Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Rev. 52(3), 471–501 (2010)MathSciNetCrossRefGoogle Scholar
  12. 12.
    D. Gross, Recovering low-rank matrices from few coefficients in any basis. IEEE Trans. Inf. Theory 57(3), 1548–1566 (2011)MathSciNetCrossRefGoogle Scholar
  13. 13.
    S. Negahban, M.J. Wainwright, Estimation of (near) low-rank matrices with noise and high-dimensional scaling. Ann. Stat. 39(2), 1069–1097 (2011)MathSciNetCrossRefGoogle Scholar
  14. 14.
    E. Candes, B. Recht, Exact matrix completion via convex optimization. Commun. ACM 55(6), 111–119 (2012)CrossRefGoogle Scholar
  15. 15.
    Y. Chen, Y. Chi, Robust spectral compressed sensing via structured matrix completion. IEEE Trans. Inf. Theory 60(10), 6576–6601 (2014)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Y. Chen, Y. Chi, A. Goldsmith, Exact and stable covariance estimation from quadratic sampling via convex programming. IEEE Trans. Inf. Theory 61(7), 4034–4059 (2015)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Y. Chen, Y. Chi, Harnessing structures in big data via guaranteed low-rank matrix estimation: recent theory and fast algorithms via convex and nonconvex optimization. IEEE Signal Process. Mag. 35(4), 14–31 (2018)CrossRefGoogle Scholar
  18. 18.
    S. Tu, R. Boczar, M. Simchowitz, M. Soltanolkotabi, B. Recht, Low-rank solutions of linear matrix equations via procrustes flow, in Proceedings of the 33rd International Conference on International Conference on Machine Learning (ICML) (2016), pp. 964–973Google Scholar
  19. 19.
    J. Drenth, X-Ray Crystallography (Wiley Online Library, 2007)Google Scholar
  20. 20.
    J.R. Fienup, Phase retrieval algorithms: a comparison. Appl. Opt. 21(15), 2758–2769 (1982)CrossRefGoogle Scholar
  21. 21.
    H. Zhang, Y. Zhou, Y. Liang, Y. Chi, A nonconvex approach for phase retrieval: reshaped wirtinger flow and incremental algorithms. J. Mach. Learn. Res. 18(141), 1–35 (2017)MathSciNetzbMATHGoogle Scholar
  22. 22.
    H. Zhang, Y. Chi, Y. Liang, Provable non-convex phase retrieval with outliers: Median truncated wirtinger flow, in International Conference on Machine Learning (2016), pp. 1022–1031Google Scholar
  23. 23.
    Y. Chen, E. Candes, Solving random quadratic systems of equations is nearly as easy as solving linear systems, in Advances in Neural Information Processing Systems (NIPS) (2015)Google Scholar
  24. 24.
    E.J. Candès, Y. Plan, Tight oracle inequalities for low-rank matrix recovery from a minimal number of noisy random measurements. IEEE Trans. Inf. Theory 57(4), 2342–2359 (2011)MathSciNetCrossRefGoogle Scholar
  25. 25.
    Y. Chi, Y. M. Lu, Y. Chen, Nonconvex optimization meets low-rank matrix factorization: an overview. arXiv:1809.09573 (2018)
  26. 26.
    P. Netrapalli, P. Jain, S. Sanghavi, Phase retrieval using alternating minimization, Advances in Neural Information Processing Systems (NIPS) (2013)Google Scholar
  27. 27.
    G. Wang, G.B. Giannakis, Y.C. Eldar, Solving systems of random quadratic equations via truncated amplitude flow. IEEE Trans. Inf. Theory 64(2), 773–794 (2018)MathSciNetCrossRefGoogle Scholar
  28. 28.
    J. Sun, Q. Qu, J. Wright, A geometric analysis of phase retrieval. Found. Comput. Math. 18(5), 1131–1198 (2018)MathSciNetCrossRefGoogle Scholar
  29. 29.
    C. Ma, K. Wang, Y. Chi, Y. Chen, Implicit regularization in nonconvex statistical estimation: gradient descent converges linearly for phase retrieval, matrix completion and blind deconvolution. arXiv:1711.10467 (2017)
  30. 30.
    Y. Li, C. Ma, Y. Chen, Y. Chi, Nonconvex matrix factorization from rank-one measurements. arXiv:1802.06286 (2018)
  31. 31.
    Y. Chen, Y. Chi, J. Fan, C. Ma, Gradient descent with random initialization: fast global convergence for nonconvex phase retrieval. arXiv:1803.07726 (2018)
  32. 32.
    R.H. Keshavan, A. Montanari, S. Oh, Matrix completion from a few entries. IEEE Trans. Inf. Theory 56(6), 2980–2998 (2010)MathSciNetCrossRefGoogle Scholar
  33. 33.
    P. Jain, P. Netrapalli, S. Sanghavi, Low-rank matrix completion using alternating minimization, in Proceedings of the Forty-Fifth Annual ACM Symposium on Theory of Computing (2013), pp. 665–674Google Scholar
  34. 34.
    R. Sun, Z.-Q. Luo, Guaranteed matrix completion via nonconvex factorization, in IEEE 56th Annual Symposium on Foundations of Computer Science (FOCS) (2015), pp. 270–289Google Scholar
  35. 35.
    M. Hardt, Understanding alternating minimization for matrix completion, in IEEE 55th Annual Symposium on Foundations of Computer Science (FOCS) (2014), pp. 651–660Google Scholar
  36. 36.
    C. De Sa, C. Re, K. Olukotun, Global convergence of stochastic gradient descent for some non-convex matrix problems, in International Conference on Machine Learning (2015), pp. 2332–2341Google Scholar
  37. 37.
    Q. Zheng, J. Lafferty, Convergence analysis for rectangular matrix completion using burer-monteiro factorization and gradient descent. arXiv:1605.07051 (2016)
  38. 38.
    C. Jin, S. M. Kakade, P. Netrapalli, Provable efficient online matrix completion via non-convex stochastic gradient descent, in Advances in Neural Information Processing Systems (2016), pp. 4520–4528Google Scholar
  39. 39.
    R. Ge, J. D. Lee, T. Ma, Matrix completion has no spurious local minimum, in Advances in Neural Information Processing Systems (NIPS) (2016), pp. 2973–2981Google Scholar
  40. 40.
    S. Bhojanapalli, B. Neyshabur, N. Srebro, Global optimality of local search for low rank matrix recovery, in Advances in Neural Information Processing Systems (2016), pp. 3873–3881Google Scholar
  41. 41.
    Y. Chen, M.J. Wainwright, Fast low-rank estimation by projected gradient descent: general statistical and algorithmic guarantees. arXiv:1509.03025 (2015)
  42. 42.
    Q. Zheng, J. Lafferty, A convergent gradient descent algorithm for rank minimization and semidefinite programming from random linear measurements, in Advances in Neural Information Processing Systems (NIPS) (2015)Google Scholar
  43. 43.
    D. Park, A. Kyrillidis, C. Caramanis, S. Sanghavi, Finding low-rank solutions via nonconvex matrix factorization, efficiently and provably. SIAM J. Imaging Sci. 11(4), 2165–2204 (2018)MathSciNetCrossRefGoogle Scholar
  44. 44.
    K. Wei, J.-F. Cai, T.F. Chan, S. Leung, Guarantees of riemannian optimization for low rank matrix recovery. SIAM J. Matrix Anal. Appl. 37(3), 1198–1222 (2016)MathSciNetCrossRefGoogle Scholar
  45. 45.
    Q. Li, G. Tang, The nonconvex geometry of low-rank matrix optimizations with general objective functions, in 2017 IEEE Global Conference on Signal and Information Processing (GlobalSIP) (IEEE, 2017), pp. 1235–1239Google Scholar
  46. 46.
    X. Li, J. Haupt, J. Lu, Z. Wang, R. Arora, H. Liu, T. Zhao, Symmetry, saddle points, and global optimization landscape of nonconvex matrix factorization, in Information Theory and Applications Workshop (ITA) (IEEE 2018), pp. 1–9Google Scholar
  47. 47.
    P. Netrapalli, U. Niranjan, S. Sanghavi, A. Anandkumar, P. Jain, Non-convex robust PCA, in Advances in Neural Information Processing Systems (NIPS) (2014)Google Scholar
  48. 48.
    X. Yi, D. Park, Y. Chen, C. Caramanis, Fast algorithms for robust PCA via gradient descent, in Advances in Neural Information Processing Systems (2016), pp. 4152–4160Google Scholar
  49. 49.
    A. Anandkumar, P. Jain, Y. Shi, U.N. Niranjan, Tensor vs. matrix methods: Robust tensor decomposition under block sparse perturbations, in Artificial Intelligence and Statistics (2016), pp. 268–276Google Scholar
  50. 50.
    S. Arora, R. Ge, T. Ma, A. Moitra, Simple, efficient, and neural algorithms for sparse coding, in Conference on Learning Theory (2015), pp. 113–149Google Scholar
  51. 51.
    J. Sun, Q. Qu, J. Wright, Complete dictionary recovery using nonconvex optimization, in Proceedings of the 32nd International Conference on Machine Learning (ICML) (2015)Google Scholar
  52. 52.
    A. S. Bandeira, N. Boumal, V. Voroninski, On the low-rank approach for semidefinite programs arising in synchronization and community detection, in 29th Annual Conference on Learning Theory (2016)Google Scholar
  53. 53.
    N. Boumal, Nonconvex phase synchronization. SIAM J. Optim. 26(4), 2355–2377 (2016)MathSciNetCrossRefGoogle Scholar
  54. 54.
    K. Lee, Y. Li, M. Junge, Y. Bresler, Blind recovery of sparse signals from subsampled convolution. IEEE Trans. Inf. Theory 63(2), 802–821 (2017)MathSciNetCrossRefGoogle Scholar
  55. 55.
    Y. Chen, E.J. Candès, The projected power method: an efficient algorithm for joint alignment from pairwise differences. Commun. Pure Appl. Math. 71(8), 1648–1714 (2018)MathSciNetCrossRefGoogle Scholar
  56. 56.
    K. Chen, On k-median clustering in high dimensions, in Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithm (2006)Google Scholar
  57. 57.
    D. Wagner, Resilient aggregation in sensor networks, in Proceedings of the 2nd ACM Workshop on Security of Ad Hoc and Sensor Networks (ACM, 2004), pp. 78–87Google Scholar
  58. 58.
    Y. Chen, C. Caramanis, S. Mannor, Robust sparse regression under adversarial corruption, in Proceedings of the 30th International Conference on Machine Learning (ICML) (2013)Google Scholar
  59. 59.
    C. Qu, H. Xu, Subspace clustering with irrelevant features via robust dantzig selector, in Advances in Neural Information Processing Systems (NIPS) (2015)Google Scholar
  60. 60.
    A. Prasad, A. S. Suggala, S. Balakrishnan, P. Ravikumar, Robust estimation via robust gradient estimation. arXiv:1802.06485 (2018)
  61. 61.
    D. Yin, Y. Chen, R. Kannan, P. Bartlett, Byzantine-robust distributed learning: towards optimal statistical rates, in Proceedings of the 35th International Conference on Machine Learning, 10–15 Jul 2018 (2018), pp. 5650–5659Google Scholar
  62. 62.
    Y. Chen, L. Su, J. Xu, Distributed statistical machine learning in adversarial settings: byzantine gradient descent, in Proceedings of the ACM on Measurement and Analysis of Computing Systems, vol. 1, no 2 (2017), p. 44Google Scholar
  63. 63.
    Y. Li, Y. Sun, Y. Chi, Low-rank positive semidefinite matrix recovery from corrupted rank-one measurements. IEEE Trans. Signal Process. 65(2), 397–408 (2017)MathSciNetCrossRefGoogle Scholar
  64. 64.
    P. Hand, Phaselift is robust to a constant fraction of arbitrary errors. Applied and Computational Harmonic Analysis 42(3), 550–562 (2017)MathSciNetCrossRefGoogle Scholar
  65. 65.
    D. Weller, A. Pnueli, G. Divon, O. Radzyner, Y. Eldar, J. Fessler, Undersampled phase retrieval with outliers. IEEE Transactions on Computational Imaging 1(4), 247–258 (2015)MathSciNetCrossRefGoogle Scholar
  66. 66.
    J. Wright, A. Ganesh, K. Min, Y. Ma, Compressive principal component pursuit. Information and Inference 2(1), 32–68 (2013)MathSciNetCrossRefGoogle Scholar
  67. 67.
    Y. Cherapanamjeri, K. Gupta, P. Jain, Nearly optimal robust matrix completion, in Proceedings of the 34th International Conference on Machine Learning (2017), pp. 797–805Google Scholar
  68. 68.
    X. Zhang, L. Wang, Q. Gu, A unified framework for nonconvex low-rank plus sparse matrix recovery, in International Conference on Artificial Intelligence and Statistics (2018), pp. 1097–1107Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Yuejie Chi
    • 1
    Email author
  • Yuanxin Li
    • 1
  • Huishuai Zhang
    • 2
  • Yingbin Liang
    • 3
  1. 1.Department of Electrical and Computer EngineeringCarnegie Mellon UniversityPittsburghUSA
  2. 2.Microsoft Research AsiaBeijingChina
  3. 3.Department of Electrical and Computer EngineeringThe Ohio State UniversityColumbusUSA

Personalised recommendations