# Solving Equations of Random Convex Functions via Anchored Regression

• Published:

## Abstract

We consider the question of estimating a solution to a system of equations that involve convex nonlinearities, a problem that is common in machine learning and signal processing. Because of these nonlinearities, conventional estimators based on empirical risk minimization generally involve solving a non-convex optimization program. We propose anchored regression, a new approach based on convex programming that amounts to maximizing a linear functional (perhaps augmented by a regularizer) over a convex set. The proposed convex program is formulated in the natural space of the problem, and avoids the introduction of auxiliary variables, making it computationally favorable. Working in the native space also provides great flexibility as structural priors (e.g., sparsity) can be seamlessly incorporated. For our analysis, we model the equations as being drawn from a fixed set according to a probability law. Our main results provide guarantees on the accuracy of the estimator in terms of the number of equations we are solving, the amount of noise present, a measure of statistical complexity of the random equations, and the geometry of the regularizer at the true solution. We also provide recipes for constructing the anchor vector (that determines the linear functional to maximize) directly from the observed data.

This is a preview of subscription content, log in via an institution to check access.

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

## Notes

1. Of course, we need to be able to evaluate the $$f_m$$ and some number of its derivatives to actually solve (1.6).

2. Unlike conventional definition of Rademacher complexities, we use a normalization by square root of the number of samples.

3. Because $$\psi _{\alpha }\left( \cdot \right)$$ is bounded, we can treat $$t=0$$ as $$t\rightarrow 0$$ to avoid the issue of division by zero.

## References

1. A. Ahmed, B. Recht, and J. Romberg. Blind deconvolution using convex programming. IEEE Trans. Inform. Theory, 60(3):1711–1732, 2014.

2. S. Bahmani and J. Romberg. Phase retrieval meets statistical learning theory: A flexible convex relaxation. In A. Singh and J. Zhu, editors, Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, volume 54 of Proceedings of Machine Learning Research, pages 252–260. PMLR, 2017.

3. S. Bahmani and J. Romberg. A flexible convex relaxation for phase retrieval. Elect. J. Stat., 11(2):5254–5281, 2017.

4. S. Bahmani, B. Raj, and P. T. Boufounos. Greedy sparsity-constrained optimization. J. Machine Learning Research, 14:807–841, 2013.

5. A. Beck and Y. C. Eldar. Sparsity constrained nonlinear optimization: Optimality conditions and algorithms. SIAM J. Optim., 23(3):1480–1509, 2013.

6. Q. Berthet and P. Rigollet. Complexity theoretic lower bounds for sparse principal component detection. In Journal of Machine Learning Research W&CP, volume 30 of Proceedings of the 26th Conference on Learning Theory (COLT’13), pages 1046–1066, 2013.

7. T. Blumensath. Compressed sensing with nonlinear observations and related nonlinear optimization problems. IEEE Trans. Inform. Theory, 59(6):3466–3474, 2013.

8. E. Candès and X. Li. Solving quadratic equations via PhaseLift when there are about as many equations as unknowns. Found. of Comput. Math., 14:1017–1026, 2014.

9. E. Candès, T. Strohmer, and V. Voroninski. Phaselift: Exact and stable signal recovery from magnitude measurements via convex programming. Comm. Pure Appl. Math., 66(8):1241–1274, 2013.

10. E. J. Candès, X. Li, and M. Soltanolkotabi. Phase retrieval via Wirtinger flow: Theory and algorithms. Information Theory, IEEE Transactions on, 61(4):1985–2007, Apr. 2015.

11. V. H. de la Peña and E. Giné. Decoupling: From dependence to independence. Probability and its Applications. Springer-Verlag, New York, 1999.

12. L. Dümbgen, S. A. van de Geer, M. C. Veraar, and J. A. Wellner. Nemirovski’s inequalities revisited. American mathematical monthly, 117(2):138–160, 2010.

13. M. Ehler, M. Fornasier, and J. Sigl. Quasi-linear compressed sensing. SIAM J. Multiscale Model. Simul., 12(2):725–754, 2014.

14. T. Goldstein and C. Studer. Convex phase retrieval without lifting via PhaseMax. In D. Precup and Y. W. Teh, editors, Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 1273–1281. PMLR, 2017.

15. P. Hand and V. Voroninski. Compressed sensing from phaseless gaussian measurements via linear programming in the natural parameter space. preprint arXiv:1611.05985, 2016.

16. P. Hand and V. Voroninski. Corruption robust phase retrieval via linear programming. preprint arXiv:1612.03547, 2016.

17. P. Hand and V. Voroninski. An elementary proof of convex phase retrieval in the natural parameter space via the linear program PhaseMax. preprint arXiv:1611.03935, 2016.

18. S. Haykin. Neural Networks and Learning Machines. Pearson, Upper Saddle River, NJ, USA, 3rd edition, 2009.

19. C. J. Hillar and L. Lim. Most tensor problems are NP-hard. Journal of the ACM, 60(6):45:1–45:39, Nov. 2013.

20. H. Ichimura. Semiparametric least squares (SLS) and weighted SLS estimation of single-index models. Journal of Econometrics, 58(1):71 – 120, 1993.

21. I. M. Johnstone and A. Y. Lu. On consistency and sparsity for principal components analysis in high dimensions. Journal of the American Statistical Association, 104(486):682–693, 2009.

22. V. Koltchinskii and S. Mendelson. Bounding the smallest singular value of a random matrix without concentration. International Mathematics Research Notices, 2015(23):12991–13008, 2015.

23. G. Lecué and S. Mendelson. Regularization and the small-ball method II: Complexity dependent error rates. Journal of Machine Learning Research, 18(146):1–48, 2017.

24. G. Lecué and S. Mendelson. Regularization and the small-ball method I: Sparse recovery. Ann. Statist., 46(2):611–641, 04 2018.

25. M. Ledoux and M. Talagrand. Probability in Banach Spaces: Isoperimetry and processes. Springer Science & Business Media, 2013.

26. W. V. Li and A. Wei. Gaussian integrals involving absolute value functions, volume Volume 5 of Collections, pages 43–59. Institute of Mathematical Statistics, Beachwood, Ohio, USA, 2009.

27. S. Ling and T. Strohmer. Self-calibration and biconvex compressive sensing. Inverse Problems, 31:115002, 2015.

28. P. Mccullagh and J. A. Nelder. Generalized linear models, volume 37 of Monographs on statistics and applied probability. Chapman and Hall/CRC, London ; New York, 2nd edition, 1989.

29. C. McDiarmid. On the method of bounded differences. Surveys in combinatorics, 141(1):148–188, 1989.

30. S. Mendelson. Learning without concentration. In Proceedings of the 27th Conference on Learning Theory (COLT), volume 35 of JMLR W&CP, pages 25–39, 2014.

31. S. Mendelson. Learning without concentration for general loss functions. preprint; arXiv:1410.3192, 2014.

32. A. Nemirovski. Topics in Non-parametric Statistics, chapter 5, pages 183–206. Springer Berlin Heidelberg, Berlin, Heidelberg, 2000.

33. S. Oymak, A. Jalali, M. Fazel, Y. Eldar, and B. Hassibi. Simultaneously structured models with application to sparse and low-rank matrices. Information Theory, IEEE Transactions on, 61(5):2886–2908, 2015.

34. R. E. A. C. Paley and A. Zygmund. A note on analytic functions in the unit circle. Mathematical Proceedings of the Cambridge Philosophical Society, 28(3):266–272, 1932.

35. Y. Plan and R. Vershynin. The generalized LASSO with non-linear observations. IEEE Transactions on Information Theory, 62(3):1528–1537, Mar. 2016.

36. Y. Plan, R. Vershynin, and E. Yudovina. High-dimensional estimation with geometric constraints. Information and Inference, 2016.

37. S. Shalev-Shwartz and S. Ben-David. Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, New York, NY, USA, 2014.

38. Y. Shechtman, A. Beck, and Y. C. Eldar. GESPAR: Efficient phase retrieval of sparse signals. IEEE Trans. Sig. Proc., 62(4):928–938, 2014.

39. M. Soltanolkotabi. Learning ReLUs via gradient descent. In Advances in Neural Information Processing Systems, volume 30, pages 2007–2017. Curran Associates, Inc., 2017.

40. C. Stein. A bound for the error in the normal approximation to the distribution of a sum of dependent random variables. In Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability, volume 2 of Probability Theory, pages 583–602, Berkeley, Calif., 1972. University of California Press.

41. J. A. Tropp. Convex Recovery of a Structured Signal from Independent Random Linear Measurements, pages 67–101. Springer International Publishing, Cham, 2015.

42. A. W. van Der Vaart and J. A. Wellner. Weak Convergence and Empirical Processes. Springer Series in Statistics. Springer, 1996.

43. Y. Yu, T. Wang, and R. J. Samworth. A useful variant of the Davis–Kahan theorem for statisticians. Biometrika, 102(2):315–323, 2014.

44. H. Zou, T. Hastie, and R. Tibshirani. Sparse principal component analysis. Journal of Computational and Graphical Statistics, 15(2):265–286, 2006.

## Author information

Authors

### Corresponding author

Correspondence to Sohail Bahmani.

Communicated by Francis Bach.

This work was supported in part by the Semiconductor Research Corporation (SRC) and DARPA.

## Rights and permissions

Reprints and permissions

Bahmani, S., Romberg, J. Solving Equations of Random Convex Functions via Anchored Regression. Found Comput Math 19, 813–841 (2019). https://doi.org/10.1007/s10208-018-9401-4

• Revised:

• Accepted:

• Published:

• Issue Date:

• DOI: https://doi.org/10.1007/s10208-018-9401-4