Skip to main content
Log in

Solving Equations of Random Convex Functions via Anchored Regression

  • Published:
Foundations of Computational Mathematics Aims and scope Submit manuscript

Abstract

We consider the question of estimating a solution to a system of equations that involve convex nonlinearities, a problem that is common in machine learning and signal processing. Because of these nonlinearities, conventional estimators based on empirical risk minimization generally involve solving a non-convex optimization program. We propose anchored regression, a new approach based on convex programming that amounts to maximizing a linear functional (perhaps augmented by a regularizer) over a convex set. The proposed convex program is formulated in the natural space of the problem, and avoids the introduction of auxiliary variables, making it computationally favorable. Working in the native space also provides great flexibility as structural priors (e.g., sparsity) can be seamlessly incorporated. For our analysis, we model the equations as being drawn from a fixed set according to a probability law. Our main results provide guarantees on the accuracy of the estimator in terms of the number of equations we are solving, the amount of noise present, a measure of statistical complexity of the random equations, and the geometry of the regularizer at the true solution. We also provide recipes for constructing the anchor vector (that determines the linear functional to maximize) directly from the observed data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Notes

  1. Of course, we need to be able to evaluate the \(f_m\) and some number of its derivatives to actually solve (1.6).

  2. Unlike conventional definition of Rademacher complexities, we use a normalization by square root of the number of samples.

  3. Because \(\psi _{\alpha }\left( \cdot \right) \) is bounded, we can treat \(t=0\) as \(t\rightarrow 0\) to avoid the issue of division by zero.

References

  1. A. Ahmed, B. Recht, and J. Romberg. Blind deconvolution using convex programming. IEEE Trans. Inform. Theory, 60(3):1711–1732, 2014.

    Article  MathSciNet  MATH  Google Scholar 

  2. S. Bahmani and J. Romberg. Phase retrieval meets statistical learning theory: A flexible convex relaxation. In A. Singh and J. Zhu, editors, Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, volume 54 of Proceedings of Machine Learning Research, pages 252–260. PMLR, 2017.

  3. S. Bahmani and J. Romberg. A flexible convex relaxation for phase retrieval. Elect. J. Stat., 11(2):5254–5281, 2017.

    Article  MathSciNet  MATH  Google Scholar 

  4. S. Bahmani, B. Raj, and P. T. Boufounos. Greedy sparsity-constrained optimization. J. Machine Learning Research, 14:807–841, 2013.

    MathSciNet  MATH  Google Scholar 

  5. A. Beck and Y. C. Eldar. Sparsity constrained nonlinear optimization: Optimality conditions and algorithms. SIAM J. Optim., 23(3):1480–1509, 2013.

    Article  MathSciNet  MATH  Google Scholar 

  6. Q. Berthet and P. Rigollet. Complexity theoretic lower bounds for sparse principal component detection. In Journal of Machine Learning Research W&CP, volume 30 of Proceedings of the 26th Conference on Learning Theory (COLT’13), pages 1046–1066, 2013.

  7. T. Blumensath. Compressed sensing with nonlinear observations and related nonlinear optimization problems. IEEE Trans. Inform. Theory, 59(6):3466–3474, 2013.

    Article  MathSciNet  MATH  Google Scholar 

  8. E. Candès and X. Li. Solving quadratic equations via PhaseLift when there are about as many equations as unknowns. Found. of Comput. Math., 14:1017–1026, 2014.

    Article  MathSciNet  MATH  Google Scholar 

  9. E. Candès, T. Strohmer, and V. Voroninski. Phaselift: Exact and stable signal recovery from magnitude measurements via convex programming. Comm. Pure Appl. Math., 66(8):1241–1274, 2013.

    Article  MathSciNet  MATH  Google Scholar 

  10. E. J. Candès, X. Li, and M. Soltanolkotabi. Phase retrieval via Wirtinger flow: Theory and algorithms. Information Theory, IEEE Transactions on, 61(4):1985–2007, Apr. 2015.

  11. V. H. de la Peña and E. Giné. Decoupling: From dependence to independence. Probability and its Applications. Springer-Verlag, New York, 1999.

  12. L. Dümbgen, S. A. van de Geer, M. C. Veraar, and J. A. Wellner. Nemirovski’s inequalities revisited. American mathematical monthly, 117(2):138–160, 2010.

    Article  MathSciNet  MATH  Google Scholar 

  13. M. Ehler, M. Fornasier, and J. Sigl. Quasi-linear compressed sensing. SIAM J. Multiscale Model. Simul., 12(2):725–754, 2014.

    Article  MathSciNet  MATH  Google Scholar 

  14. T. Goldstein and C. Studer. Convex phase retrieval without lifting via PhaseMax. In D. Precup and Y. W. Teh, editors, Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 1273–1281. PMLR, 2017.

  15. P. Hand and V. Voroninski. Compressed sensing from phaseless gaussian measurements via linear programming in the natural parameter space. preprint arXiv:1611.05985, 2016.

  16. P. Hand and V. Voroninski. Corruption robust phase retrieval via linear programming. preprint arXiv:1612.03547, 2016.

  17. P. Hand and V. Voroninski. An elementary proof of convex phase retrieval in the natural parameter space via the linear program PhaseMax. preprint arXiv:1611.03935, 2016.

  18. S. Haykin. Neural Networks and Learning Machines. Pearson, Upper Saddle River, NJ, USA, 3rd edition, 2009.

    Google Scholar 

  19. C. J. Hillar and L. Lim. Most tensor problems are NP-hard. Journal of the ACM, 60(6):45:1–45:39, Nov. 2013.

    Article  MathSciNet  MATH  Google Scholar 

  20. H. Ichimura. Semiparametric least squares (SLS) and weighted SLS estimation of single-index models. Journal of Econometrics, 58(1):71 – 120, 1993.

    Article  MathSciNet  MATH  Google Scholar 

  21. I. M. Johnstone and A. Y. Lu. On consistency and sparsity for principal components analysis in high dimensions. Journal of the American Statistical Association, 104(486):682–693, 2009.

    Article  MathSciNet  MATH  Google Scholar 

  22. V. Koltchinskii and S. Mendelson. Bounding the smallest singular value of a random matrix without concentration. International Mathematics Research Notices, 2015(23):12991–13008, 2015.

    MathSciNet  MATH  Google Scholar 

  23. G. Lecué and S. Mendelson. Regularization and the small-ball method II: Complexity dependent error rates. Journal of Machine Learning Research, 18(146):1–48, 2017.

    MathSciNet  MATH  Google Scholar 

  24. G. Lecué and S. Mendelson. Regularization and the small-ball method I: Sparse recovery. Ann. Statist., 46(2):611–641, 04 2018.

  25. M. Ledoux and M. Talagrand. Probability in Banach Spaces: Isoperimetry and processes. Springer Science & Business Media, 2013.

    MATH  Google Scholar 

  26. W. V. Li and A. Wei. Gaussian integrals involving absolute value functions, volume Volume 5 of Collections, pages 43–59. Institute of Mathematical Statistics, Beachwood, Ohio, USA, 2009.

  27. S. Ling and T. Strohmer. Self-calibration and biconvex compressive sensing. Inverse Problems, 31:115002, 2015.

    Article  MathSciNet  MATH  Google Scholar 

  28. P. Mccullagh and J. A. Nelder. Generalized linear models, volume 37 of Monographs on statistics and applied probability. Chapman and Hall/CRC, London ; New York, 2nd edition, 1989.

  29. C. McDiarmid. On the method of bounded differences. Surveys in combinatorics, 141(1):148–188, 1989.

    MathSciNet  MATH  Google Scholar 

  30. S. Mendelson. Learning without concentration. In Proceedings of the 27th Conference on Learning Theory (COLT), volume 35 of JMLR W&CP, pages 25–39, 2014.

  31. S. Mendelson. Learning without concentration for general loss functions. preprint; arXiv:1410.3192, 2014.

  32. A. Nemirovski. Topics in Non-parametric Statistics, chapter 5, pages 183–206. Springer Berlin Heidelberg, Berlin, Heidelberg, 2000.

  33. S. Oymak, A. Jalali, M. Fazel, Y. Eldar, and B. Hassibi. Simultaneously structured models with application to sparse and low-rank matrices. Information Theory, IEEE Transactions on, 61(5):2886–2908, 2015.

    Article  MathSciNet  MATH  Google Scholar 

  34. R. E. A. C. Paley and A. Zygmund. A note on analytic functions in the unit circle. Mathematical Proceedings of the Cambridge Philosophical Society, 28(3):266–272, 1932.

    Article  MATH  Google Scholar 

  35. Y. Plan and R. Vershynin. The generalized LASSO with non-linear observations. IEEE Transactions on Information Theory, 62(3):1528–1537, Mar. 2016.

  36. Y. Plan, R. Vershynin, and E. Yudovina. High-dimensional estimation with geometric constraints. Information and Inference, 2016.

  37. S. Shalev-Shwartz and S. Ben-David. Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, New York, NY, USA, 2014.

    Book  MATH  Google Scholar 

  38. Y. Shechtman, A. Beck, and Y. C. Eldar. GESPAR: Efficient phase retrieval of sparse signals. IEEE Trans. Sig. Proc., 62(4):928–938, 2014.

    Article  MathSciNet  MATH  Google Scholar 

  39. M. Soltanolkotabi. Learning ReLUs via gradient descent. In Advances in Neural Information Processing Systems, volume 30, pages 2007–2017. Curran Associates, Inc., 2017.

  40. C. Stein. A bound for the error in the normal approximation to the distribution of a sum of dependent random variables. In Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability, volume 2 of Probability Theory, pages 583–602, Berkeley, Calif., 1972. University of California Press.

  41. J. A. Tropp. Convex Recovery of a Structured Signal from Independent Random Linear Measurements, pages 67–101. Springer International Publishing, Cham, 2015.

    MATH  Google Scholar 

  42. A. W. van Der Vaart and J. A. Wellner. Weak Convergence and Empirical Processes. Springer Series in Statistics. Springer, 1996.

  43. Y. Yu, T. Wang, and R. J. Samworth. A useful variant of the Davis–Kahan theorem for statisticians. Biometrika, 102(2):315–323, 2014.

    Article  MathSciNet  MATH  Google Scholar 

  44. H. Zou, T. Hastie, and R. Tibshirani. Sparse principal component analysis. Journal of Computational and Graphical Statistics, 15(2):265–286, 2006.

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sohail Bahmani.

Additional information

Communicated by Francis Bach.

This work was supported in part by the Semiconductor Research Corporation (SRC) and DARPA.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bahmani, S., Romberg, J. Solving Equations of Random Convex Functions via Anchored Regression. Found Comput Math 19, 813–841 (2019). https://doi.org/10.1007/s10208-018-9401-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10208-018-9401-4

Keywords

Mathematics Subject Classification

Navigation