Foundations of Computational Mathematics

, Volume 14, Issue 3, pp 569–600 | Cite as

Random Design Analysis of Ridge Regression

  • Daniel Hsu
  • Sham M. Kakade
  • Tong Zhang


This work gives a simultaneous analysis of both the ordinary least squares estimator and the ridge regression estimator in the random design setting under mild assumptions on the covariate/response distributions. In particular, the analysis provides sharp results on the “out-of-sample” prediction error, as opposed to the “in-sample” (fixed design) error. The analysis also reveals the effect of errors in the estimated covariance structure, as well as the effect of modeling errors, neither of which effects are present in the fixed design setting. The proofs of the main results are based on a simple decomposition lemma combined with concentration inequalities for random vectors and matrices.


Linear regression Ordinary least squares Ridge regression Randomized approximation 

Mathematics Subject Classification

Primary 62J07 Secondary 62J05 



The authors thank Dean Foster, David McAllester, and Robert Stine for many insightful discussions.


  1. 1.
    N. Ailon and B. Chazelle. Approximate nearest neighbors and the fast Johnson-Lindenstrauss transform. SIAM J. Comput., 39(1):302–322, 2009.CrossRefzbMATHMathSciNetGoogle Scholar
  2. 2.
    J.-Y. Audibert and O. Catoni. Linear regression through PAC-Bayesian truncation, 2010. arXiv:1010.0072.
  3. 3.
    J.-Y. Audibert and O. Catoni. Robust linear least squares regression. The Annals of Statistics, 30(5):2766–2794, 2011.CrossRefMathSciNetGoogle Scholar
  4. 4.
    A. Caponnetto and E. De Vito. Optimal rates for the regularized least-squares algorithm. Foundations of Computational Mathematics, 7(3):331–368, 2007.CrossRefzbMATHMathSciNetGoogle Scholar
  5. 5.
    O. Catoni. Statistical Learning Theory and Stochastic Optimization, Lectures on Probability and Statistics, Ecole d’Eté de Probabilitiés de Saint-Flour XXXI - 2001, volume 1851 of Lecture Notes in Mathematics. Springer, 2004.Google Scholar
  6. 6.
    P. Drineas and M. W. Mahoney. Effective Resistances, Statistical Leverage, and Applications to Linear Equation Solving, 2010. arXiv:1005.3097.
  7. 7.
    P. Drineas, M. W. Mahoney, S. Muthukrishnan, and T. Sarlós. Faster least squares approximation. Numerische Mathematik, 117(2):219–249, 2010.CrossRefGoogle Scholar
  8. 8.
    L. Györfi, M. Kohler, A. Kryżak, and H. Walk. A Distribution-Free Theory of Nonparametric Regression. Springer, 2004.Google Scholar
  9. 9.
    A. E. Hoerl. Application of ridge analysis to regression problems. Chemical Engineering Progress, 58:54–59, 1962.Google Scholar
  10. 10.
    R. Horn and C. R. Johnson. Matrix Analysis. Cambridge University Press, 1985.Google Scholar
  11. 11.
    D. Hsu, S. M. Kakade, and T. Zhang. A tail inequality for quadratic forms of subgaussian random vectors, 2011. arXiv:1110.2842.
  12. 12.
    D. Hsu, S. M. Kakade, and T. Zhang. Tail inequalities for sums of random matrices that depend on the intrinsic dimension. Electronic Communications in Probability, 17(14):1–13, 2012.MathSciNetGoogle Scholar
  13. 13.
    D. Hsu and S. Sabato. Loss Minimization and Parameter Estimation with Heavy Tails, 2013. arXiv:1307.1827.
  14. 14.
    V. Koltchinskii. Local Rademacher complexities and oracle inequalities in risk minimization. The Annals of Statistics, 34(6):2593–2656, 2006.CrossRefzbMATHMathSciNetGoogle Scholar
  15. 15.
    B. Laurent and P. Massart. Adaptive estimation of a quadratic functional by model selection. The Annals of Statistics, 28(5):1302–1338, 2000.CrossRefzbMATHMathSciNetGoogle Scholar
  16. 16.
    E. L. Lehmann and G. Casella. Theory of Point Estimation. Springer, second edition, 1998.Google Scholar
  17. 17.
    M. Nussbaum. Minimax risk: Pinsker bound. In S. Kotz, editor, Encyclopedia of Statistical Sciences, Update Volume 3, pages 451–460. Wiley, New York, 1999.Google Scholar
  18. 18.
    V. Rokhlin and M. Tygert. A fast randomized algorithm for overdetermined linear least-squares regression. Proc. Natl. Acad. Sci. USA, 105(36):13212–13217, 2008.CrossRefzbMATHMathSciNetGoogle Scholar
  19. 19.
    S. Smale and D.-X. Zhou. Learning theory estimates via integral operators and their approximations. Constructive Approximations, 26:153–172, 2007.CrossRefzbMATHMathSciNetGoogle Scholar
  20. 20.
    I. Steinwart, D. Hush, and C. Scovel. Optimal Rates for Regularized Least Squares Regression. In Proceedings of the 22nd Annual Conference on Learning Theory, pp. 79–93, 2009.Google Scholar
  21. 21.
    G. W. Stewart and J.-G. Sun. Matrix Perturbation Theory. Academic Press, 1990.Google Scholar
  22. 22.
    C. J. Stone. Optimal global rates of convergence for nonparametric regression. The Annals of Statistics, 10:1040–1053, 1982.CrossRefzbMATHGoogle Scholar
  23. 23.
    T. Zhang. Learning bounds for kernel regression using effective data dimensionality. Neural Computation, 17:2077–2098, 2005.CrossRefzbMATHMathSciNetGoogle Scholar

Copyright information

© SFoCM 2014

Authors and Affiliations

  1. 1.Department of Computer ScienceColumbia UniversityNew YorkUSA
  2. 2.Microsoft ResearchCambridgeUSA
  3. 3.Department of StatisticsRutgers UniversityPiscatawayUSA

Personalised recommendations