Advertisement

Foundations of Computational Mathematics

, Volume 19, Issue 4, pp 813–841 | Cite as

Solving Equations of Random Convex Functions via Anchored Regression

  • Sohail BahmaniEmail author
  • Justin Romberg
Article

Abstract

We consider the question of estimating a solution to a system of equations that involve convex nonlinearities, a problem that is common in machine learning and signal processing. Because of these nonlinearities, conventional estimators based on empirical risk minimization generally involve solving a non-convex optimization program. We propose anchored regression, a new approach based on convex programming that amounts to maximizing a linear functional (perhaps augmented by a regularizer) over a convex set. The proposed convex program is formulated in the natural space of the problem, and avoids the introduction of auxiliary variables, making it computationally favorable. Working in the native space also provides great flexibility as structural priors (e.g., sparsity) can be seamlessly incorporated. For our analysis, we model the equations as being drawn from a fixed set according to a probability law. Our main results provide guarantees on the accuracy of the estimator in terms of the number of equations we are solving, the amount of noise present, a measure of statistical complexity of the random equations, and the geometry of the regularizer at the true solution. We also provide recipes for constructing the anchor vector (that determines the linear functional to maximize) directly from the observed data.

Keywords

Nonlinear regression Convex programming Anchored regression 

Mathematics Subject Classification

62J02 62F10 90C25 

References

  1. 1.
    A. Ahmed, B. Recht, and J. Romberg. Blind deconvolution using convex programming. IEEE Trans. Inform. Theory, 60(3):1711–1732, 2014.MathSciNetCrossRefzbMATHGoogle Scholar
  2. 2.
    S. Bahmani and J. Romberg. Phase retrieval meets statistical learning theory: A flexible convex relaxation. In A. Singh and J. Zhu, editors, Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, volume 54 of Proceedings of Machine Learning Research, pages 252–260. PMLR, 2017.Google Scholar
  3. 3.
    S. Bahmani and J. Romberg. A flexible convex relaxation for phase retrieval. Elect. J. Stat., 11(2):5254–5281, 2017.MathSciNetCrossRefzbMATHGoogle Scholar
  4. 4.
    S. Bahmani, B. Raj, and P. T. Boufounos. Greedy sparsity-constrained optimization. J. Machine Learning Research, 14:807–841, 2013.MathSciNetzbMATHGoogle Scholar
  5. 5.
    A. Beck and Y. C. Eldar. Sparsity constrained nonlinear optimization: Optimality conditions and algorithms. SIAM J. Optim., 23(3):1480–1509, 2013.MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Q. Berthet and P. Rigollet. Complexity theoretic lower bounds for sparse principal component detection. In Journal of Machine Learning Research W&CP, volume 30 of Proceedings of the 26th Conference on Learning Theory (COLT’13), pages 1046–1066, 2013.Google Scholar
  7. 7.
    T. Blumensath. Compressed sensing with nonlinear observations and related nonlinear optimization problems. IEEE Trans. Inform. Theory, 59(6):3466–3474, 2013.MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    E. Candès and X. Li. Solving quadratic equations via PhaseLift when there are about as many equations as unknowns. Found. of Comput. Math., 14:1017–1026, 2014.MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    E. Candès, T. Strohmer, and V. Voroninski. Phaselift: Exact and stable signal recovery from magnitude measurements via convex programming. Comm. Pure Appl. Math., 66(8):1241–1274, 2013.MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    E. J. Candès, X. Li, and M. Soltanolkotabi. Phase retrieval via Wirtinger flow: Theory and algorithms. Information Theory, IEEE Transactions on, 61(4):1985–2007, Apr. 2015.Google Scholar
  11. 11.
    V. H. de la Peña and E. Giné. Decoupling: From dependence to independence. Probability and its Applications. Springer-Verlag, New York, 1999.Google Scholar
  12. 12.
    L. Dümbgen, S. A. van de Geer, M. C. Veraar, and J. A. Wellner. Nemirovski’s inequalities revisited. American mathematical monthly, 117(2):138–160, 2010.MathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    M. Ehler, M. Fornasier, and J. Sigl. Quasi-linear compressed sensing. SIAM J. Multiscale Model. Simul., 12(2):725–754, 2014.MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    T. Goldstein and C. Studer. Convex phase retrieval without lifting via PhaseMax. In D. Precup and Y. W. Teh, editors, Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 1273–1281. PMLR, 2017.Google Scholar
  15. 15.
    P. Hand and V. Voroninski. Compressed sensing from phaseless gaussian measurements via linear programming in the natural parameter space. preprint arXiv:1611.05985, 2016.
  16. 16.
    P. Hand and V. Voroninski. Corruption robust phase retrieval via linear programming. preprint arXiv:1612.03547, 2016.
  17. 17.
    P. Hand and V. Voroninski. An elementary proof of convex phase retrieval in the natural parameter space via the linear program PhaseMax. preprint arXiv:1611.03935, 2016.
  18. 18.
    S. Haykin. Neural Networks and Learning Machines. Pearson, Upper Saddle River, NJ, USA, 3rd edition, 2009.Google Scholar
  19. 19.
    C. J. Hillar and L. Lim. Most tensor problems are NP-hard. Journal of the ACM, 60(6):45:1–45:39, Nov. 2013.MathSciNetCrossRefzbMATHGoogle Scholar
  20. 20.
    H. Ichimura. Semiparametric least squares (SLS) and weighted SLS estimation of single-index models. Journal of Econometrics, 58(1):71 – 120, 1993.MathSciNetCrossRefzbMATHGoogle Scholar
  21. 21.
    I. M. Johnstone and A. Y. Lu. On consistency and sparsity for principal components analysis in high dimensions. Journal of the American Statistical Association, 104(486):682–693, 2009.MathSciNetCrossRefzbMATHGoogle Scholar
  22. 22.
    V. Koltchinskii and S. Mendelson. Bounding the smallest singular value of a random matrix without concentration. International Mathematics Research Notices, 2015(23):12991–13008, 2015.MathSciNetzbMATHGoogle Scholar
  23. 23.
    G. Lecué and S. Mendelson. Regularization and the small-ball method II: Complexity dependent error rates. Journal of Machine Learning Research, 18(146):1–48, 2017.MathSciNetzbMATHGoogle Scholar
  24. 24.
    G. Lecué and S. Mendelson. Regularization and the small-ball method I: Sparse recovery. Ann. Statist., 46(2):611–641, 04 2018.Google Scholar
  25. 25.
    M. Ledoux and M. Talagrand. Probability in Banach Spaces: Isoperimetry and processes. Springer Science & Business Media, 2013.zbMATHGoogle Scholar
  26. 26.
    W. V. Li and A. Wei. Gaussian integrals involving absolute value functions, volume Volume 5 of Collections, pages 43–59. Institute of Mathematical Statistics, Beachwood, Ohio, USA, 2009.Google Scholar
  27. 27.
    S. Ling and T. Strohmer. Self-calibration and biconvex compressive sensing. Inverse Problems, 31:115002, 2015.MathSciNetCrossRefzbMATHGoogle Scholar
  28. 28.
    P. Mccullagh and J. A. Nelder. Generalized linear models, volume 37 of Monographs on statistics and applied probability. Chapman and Hall/CRC, London ; New York, 2nd edition, 1989.Google Scholar
  29. 29.
    C. McDiarmid. On the method of bounded differences. Surveys in combinatorics, 141(1):148–188, 1989.MathSciNetzbMATHGoogle Scholar
  30. 30.
    S. Mendelson. Learning without concentration. In Proceedings of the 27th Conference on Learning Theory (COLT), volume 35 of JMLR W&CP, pages 25–39, 2014.Google Scholar
  31. 31.
    S. Mendelson. Learning without concentration for general loss functions. preprint; arXiv:1410.3192, 2014.
  32. 32.
    A. Nemirovski. Topics in Non-parametric Statistics, chapter 5, pages 183–206. Springer Berlin Heidelberg, Berlin, Heidelberg, 2000.Google Scholar
  33. 33.
    S. Oymak, A. Jalali, M. Fazel, Y. Eldar, and B. Hassibi. Simultaneously structured models with application to sparse and low-rank matrices. Information Theory, IEEE Transactions on, 61(5):2886–2908, 2015.MathSciNetCrossRefzbMATHGoogle Scholar
  34. 34.
    R. E. A. C. Paley and A. Zygmund. A note on analytic functions in the unit circle. Mathematical Proceedings of the Cambridge Philosophical Society, 28(3):266–272, 1932.CrossRefzbMATHGoogle Scholar
  35. 35.
    Y. Plan and R. Vershynin. The generalized LASSO with non-linear observations. IEEE Transactions on Information Theory, 62(3):1528–1537, Mar. 2016.Google Scholar
  36. 36.
    Y. Plan, R. Vershynin, and E. Yudovina. High-dimensional estimation with geometric constraints. Information and Inference, 2016.Google Scholar
  37. 37.
    S. Shalev-Shwartz and S. Ben-David. Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, New York, NY, USA, 2014.CrossRefzbMATHGoogle Scholar
  38. 38.
    Y. Shechtman, A. Beck, and Y. C. Eldar. GESPAR: Efficient phase retrieval of sparse signals. IEEE Trans. Sig. Proc., 62(4):928–938, 2014.MathSciNetCrossRefzbMATHGoogle Scholar
  39. 39.
    M. Soltanolkotabi. Learning ReLUs via gradient descent. In Advances in Neural Information Processing Systems, volume 30, pages 2007–2017. Curran Associates, Inc., 2017.Google Scholar
  40. 40.
    C. Stein. A bound for the error in the normal approximation to the distribution of a sum of dependent random variables. In Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability, volume 2 of Probability Theory, pages 583–602, Berkeley, Calif., 1972. University of California Press.Google Scholar
  41. 41.
    J. A. Tropp. Convex Recovery of a Structured Signal from Independent Random Linear Measurements, pages 67–101. Springer International Publishing, Cham, 2015.zbMATHGoogle Scholar
  42. 42.
    A. W. van Der Vaart and J. A. Wellner. Weak Convergence and Empirical Processes. Springer Series in Statistics. Springer, 1996.Google Scholar
  43. 43.
    Y. Yu, T. Wang, and R. J. Samworth. A useful variant of the Davis–Kahan theorem for statisticians. Biometrika, 102(2):315–323, 2014.MathSciNetCrossRefzbMATHGoogle Scholar
  44. 44.
    H. Zou, T. Hastie, and R. Tibshirani. Sparse principal component analysis. Journal of Computational and Graphical Statistics, 15(2):265–286, 2006.MathSciNetCrossRefGoogle Scholar

Copyright information

© SFoCM 2018

Authors and Affiliations

  1. 1.School of Electrical and Computer EngineeringGeorgia Institute of TechnologyAtlantaUSA

Personalised recommendations