Solving Equations of Random Convex Functions via Anchored Regression
- 214 Downloads
We consider the question of estimating a solution to a system of equations that involve convex nonlinearities, a problem that is common in machine learning and signal processing. Because of these nonlinearities, conventional estimators based on empirical risk minimization generally involve solving a non-convex optimization program. We propose anchored regression, a new approach based on convex programming that amounts to maximizing a linear functional (perhaps augmented by a regularizer) over a convex set. The proposed convex program is formulated in the natural space of the problem, and avoids the introduction of auxiliary variables, making it computationally favorable. Working in the native space also provides great flexibility as structural priors (e.g., sparsity) can be seamlessly incorporated. For our analysis, we model the equations as being drawn from a fixed set according to a probability law. Our main results provide guarantees on the accuracy of the estimator in terms of the number of equations we are solving, the amount of noise present, a measure of statistical complexity of the random equations, and the geometry of the regularizer at the true solution. We also provide recipes for constructing the anchor vector (that determines the linear functional to maximize) directly from the observed data.
KeywordsNonlinear regression Convex programming Anchored regression
Mathematics Subject Classification62J02 62F10 90C25
- 2.S. Bahmani and J. Romberg. Phase retrieval meets statistical learning theory: A flexible convex relaxation. In A. Singh and J. Zhu, editors, Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, volume 54 of Proceedings of Machine Learning Research, pages 252–260. PMLR, 2017.Google Scholar
- 6.Q. Berthet and P. Rigollet. Complexity theoretic lower bounds for sparse principal component detection. In Journal of Machine Learning Research W&CP, volume 30 of Proceedings of the 26th Conference on Learning Theory (COLT’13), pages 1046–1066, 2013.Google Scholar
- 10.E. J. Candès, X. Li, and M. Soltanolkotabi. Phase retrieval via Wirtinger flow: Theory and algorithms. Information Theory, IEEE Transactions on, 61(4):1985–2007, Apr. 2015.Google Scholar
- 11.V. H. de la Peña and E. Giné. Decoupling: From dependence to independence. Probability and its Applications. Springer-Verlag, New York, 1999.Google Scholar
- 14.T. Goldstein and C. Studer. Convex phase retrieval without lifting via PhaseMax. In D. Precup and Y. W. Teh, editors, Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 1273–1281. PMLR, 2017.Google Scholar
- 15.P. Hand and V. Voroninski. Compressed sensing from phaseless gaussian measurements via linear programming in the natural parameter space. preprint arXiv:1611.05985, 2016.
- 16.P. Hand and V. Voroninski. Corruption robust phase retrieval via linear programming. preprint arXiv:1612.03547, 2016.
- 17.P. Hand and V. Voroninski. An elementary proof of convex phase retrieval in the natural parameter space via the linear program PhaseMax. preprint arXiv:1611.03935, 2016.
- 18.S. Haykin. Neural Networks and Learning Machines. Pearson, Upper Saddle River, NJ, USA, 3rd edition, 2009.Google Scholar
- 24.G. Lecué and S. Mendelson. Regularization and the small-ball method I: Sparse recovery. Ann. Statist., 46(2):611–641, 04 2018.Google Scholar
- 26.W. V. Li and A. Wei. Gaussian integrals involving absolute value functions, volume Volume 5 of Collections, pages 43–59. Institute of Mathematical Statistics, Beachwood, Ohio, USA, 2009.Google Scholar
- 28.P. Mccullagh and J. A. Nelder. Generalized linear models, volume 37 of Monographs on statistics and applied probability. Chapman and Hall/CRC, London ; New York, 2nd edition, 1989.Google Scholar
- 30.S. Mendelson. Learning without concentration. In Proceedings of the 27th Conference on Learning Theory (COLT), volume 35 of JMLR W&CP, pages 25–39, 2014.Google Scholar
- 31.S. Mendelson. Learning without concentration for general loss functions. preprint; arXiv:1410.3192, 2014.
- 32.A. Nemirovski. Topics in Non-parametric Statistics, chapter 5, pages 183–206. Springer Berlin Heidelberg, Berlin, Heidelberg, 2000.Google Scholar
- 35.Y. Plan and R. Vershynin. The generalized LASSO with non-linear observations. IEEE Transactions on Information Theory, 62(3):1528–1537, Mar. 2016.Google Scholar
- 36.Y. Plan, R. Vershynin, and E. Yudovina. High-dimensional estimation with geometric constraints. Information and Inference, 2016.Google Scholar
- 39.M. Soltanolkotabi. Learning ReLUs via gradient descent. In Advances in Neural Information Processing Systems, volume 30, pages 2007–2017. Curran Associates, Inc., 2017.Google Scholar
- 40.C. Stein. A bound for the error in the normal approximation to the distribution of a sum of dependent random variables. In Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability, volume 2 of Probability Theory, pages 583–602, Berkeley, Calif., 1972. University of California Press.Google Scholar
- 42.A. W. van Der Vaart and J. A. Wellner. Weak Convergence and Empirical Processes. Springer Series in Statistics. Springer, 1996.Google Scholar