Random Design Analysis of Ridge Regression
- 699 Downloads
This work gives a simultaneous analysis of both the ordinary least squares estimator and the ridge regression estimator in the random design setting under mild assumptions on the covariate/response distributions. In particular, the analysis provides sharp results on the “out-of-sample” prediction error, as opposed to the “in-sample” (fixed design) error. The analysis also reveals the effect of errors in the estimated covariance structure, as well as the effect of modeling errors, neither of which effects are present in the fixed design setting. The proofs of the main results are based on a simple decomposition lemma combined with concentration inequalities for random vectors and matrices.
KeywordsLinear regression Ordinary least squares Ridge regression Randomized approximation
Mathematics Subject ClassificationPrimary 62J07 Secondary 62J05
The authors thank Dean Foster, David McAllester, and Robert Stine for many insightful discussions.
- 2.J.-Y. Audibert and O. Catoni. Linear regression through PAC-Bayesian truncation, 2010. arXiv:1010.0072.
- 5.O. Catoni. Statistical Learning Theory and Stochastic Optimization, Lectures on Probability and Statistics, Ecole d’Eté de Probabilitiés de Saint-Flour XXXI - 2001, volume 1851 of Lecture Notes in Mathematics. Springer, 2004.Google Scholar
- 6.P. Drineas and M. W. Mahoney. Effective Resistances, Statistical Leverage, and Applications to Linear Equation Solving, 2010. arXiv:1005.3097.
- 8.L. Györfi, M. Kohler, A. Kryżak, and H. Walk. A Distribution-Free Theory of Nonparametric Regression. Springer, 2004.Google Scholar
- 9.A. E. Hoerl. Application of ridge analysis to regression problems. Chemical Engineering Progress, 58:54–59, 1962.Google Scholar
- 10.R. Horn and C. R. Johnson. Matrix Analysis. Cambridge University Press, 1985.Google Scholar
- 11.D. Hsu, S. M. Kakade, and T. Zhang. A tail inequality for quadratic forms of subgaussian random vectors, 2011. arXiv:1110.2842.
- 13.D. Hsu and S. Sabato. Loss Minimization and Parameter Estimation with Heavy Tails, 2013. arXiv:1307.1827.
- 16.E. L. Lehmann and G. Casella. Theory of Point Estimation. Springer, second edition, 1998.Google Scholar
- 17.M. Nussbaum. Minimax risk: Pinsker bound. In S. Kotz, editor, Encyclopedia of Statistical Sciences, Update Volume 3, pages 451–460. Wiley, New York, 1999.Google Scholar
- 20.I. Steinwart, D. Hush, and C. Scovel. Optimal Rates for Regularized Least Squares Regression. In Proceedings of the 22nd Annual Conference on Learning Theory, pp. 79–93, 2009.Google Scholar
- 21.G. W. Stewart and J.-G. Sun. Matrix Perturbation Theory. Academic Press, 1990.Google Scholar