Random Design Analysis of Ridge Regression

Hsu, Daniel; Kakade, Sham M.; Zhang, Tong

doi:10.1007/s10208-014-9192-1

Random Design Analysis of Ridge Regression

Published: 22 March 2014

Volume 14, pages 569–600, (2014)
Cite this article

Foundations of Computational Mathematics Aims and scope Submit manuscript

Daniel Hsu¹,
Sham M. Kakade² &
Tong Zhang³

1119 Accesses
59 Citations
3 Altmetric
Explore all metrics

Abstract

This work gives a simultaneous analysis of both the ordinary least squares estimator and the ridge regression estimator in the random design setting under mild assumptions on the covariate/response distributions. In particular, the analysis provides sharp results on the “out-of-sample” prediction error, as opposed to the “in-sample” (fixed design) error. The analysis also reveals the effect of errors in the estimated covariance structure, as well as the effect of modeling errors, neither of which effects are present in the fixed design setting. The proofs of the main results are based on a simple decomposition lemma combined with concentration inequalities for random vectors and matrices.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sparse Covariance and Precision Random Design Regression

High-Dimensional Regression Under Correlated Design: An Extensive Simulation Study

Robust Optimal Design When Missing Data Happen at Random

Article 14 August 2023

Rui Hu, Ion Bica & Zhichun Zhai

References

N. Ailon and B. Chazelle. Approximate nearest neighbors and the fast Johnson-Lindenstrauss transform. SIAM J. Comput., 39(1):302–322, 2009.
Article MATH MathSciNet Google Scholar
J.-Y. Audibert and O. Catoni. Linear regression through PAC-Bayesian truncation, 2010. arXiv:1010.0072.
J.-Y. Audibert and O. Catoni. Robust linear least squares regression. The Annals of Statistics, 30(5):2766–2794, 2011.
Article MathSciNet Google Scholar
A. Caponnetto and E. De Vito. Optimal rates for the regularized least-squares algorithm. Foundations of Computational Mathematics, 7(3):331–368, 2007.
Article MATH MathSciNet Google Scholar
O. Catoni. Statistical Learning Theory and Stochastic Optimization, Lectures on Probability and Statistics, Ecole d’Eté de Probabilitiés de Saint-Flour XXXI - 2001, volume 1851 of Lecture Notes in Mathematics. Springer, 2004.
P. Drineas and M. W. Mahoney. Effective Resistances, Statistical Leverage, and Applications to Linear Equation Solving, 2010. arXiv:1005.3097.
P. Drineas, M. W. Mahoney, S. Muthukrishnan, and T. Sarlós. Faster least squares approximation. Numerische Mathematik, 117(2):219–249, 2010.
Article Google Scholar
L. Györfi, M. Kohler, A. Kryżak, and H. Walk. A Distribution-Free Theory of Nonparametric Regression. Springer, 2004.
A. E. Hoerl. Application of ridge analysis to regression problems. Chemical Engineering Progress, 58:54–59, 1962.
Google Scholar
R. Horn and C. R. Johnson. Matrix Analysis. Cambridge University Press, 1985.
D. Hsu, S. M. Kakade, and T. Zhang. A tail inequality for quadratic forms of subgaussian random vectors, 2011. arXiv:1110.2842.
D. Hsu, S. M. Kakade, and T. Zhang. Tail inequalities for sums of random matrices that depend on the intrinsic dimension. Electronic Communications in Probability, 17(14):1–13, 2012.
MathSciNet Google Scholar
D. Hsu and S. Sabato. Loss Minimization and Parameter Estimation with Heavy Tails, 2013. arXiv:1307.1827.
V. Koltchinskii. Local Rademacher complexities and oracle inequalities in risk minimization. The Annals of Statistics, 34(6):2593–2656, 2006.
Article MATH MathSciNet Google Scholar
B. Laurent and P. Massart. Adaptive estimation of a quadratic functional by model selection. The Annals of Statistics, 28(5):1302–1338, 2000.
Article MATH MathSciNet Google Scholar
E. L. Lehmann and G. Casella. Theory of Point Estimation. Springer, second edition, 1998.
M. Nussbaum. Minimax risk: Pinsker bound. In S. Kotz, editor, Encyclopedia of Statistical Sciences, Update Volume 3, pages 451–460. Wiley, New York, 1999.
V. Rokhlin and M. Tygert. A fast randomized algorithm for overdetermined linear least-squares regression. Proc. Natl. Acad. Sci. USA, 105(36):13212–13217, 2008.
Article MATH MathSciNet Google Scholar
S. Smale and D.-X. Zhou. Learning theory estimates via integral operators and their approximations. Constructive Approximations, 26:153–172, 2007.
Article MATH MathSciNet Google Scholar
I. Steinwart, D. Hush, and C. Scovel. Optimal Rates for Regularized Least Squares Regression. In Proceedings of the 22nd Annual Conference on Learning Theory, pp. 79–93, 2009.
G. W. Stewart and J.-G. Sun. Matrix Perturbation Theory. Academic Press, 1990.
C. J. Stone. Optimal global rates of convergence for nonparametric regression. The Annals of Statistics, 10:1040–1053, 1982.
Article MATH Google Scholar
T. Zhang. Learning bounds for kernel regression using effective data dimensionality. Neural Computation, 17:2077–2098, 2005.
Article MATH MathSciNet Google Scholar

Download references

Acknowledgments

The authors thank Dean Foster, David McAllester, and Robert Stine for many insightful discussions.

Author information

Authors and Affiliations

Department of Computer Science, Columbia University, 450 Computer Science Building, 1214 Amsterdam Avenue, Mailcode 0401, New York, NY, 10027-7003, USA
Daniel Hsu
Microsoft Research, One Memorial Drive, Cambridge, MA, 02142, USA
Sham M. Kakade
Department of Statistics, Rutgers University, 501 Hill Center, 110 Frelinghuysen Road, Piscataway, NJ, 08854, USA
Tong Zhang

Authors

Daniel Hsu
View author publications
You can also search for this author in PubMed Google Scholar
Sham M. Kakade
View author publications
You can also search for this author in PubMed Google Scholar
Tong Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daniel Hsu.

Additional information

Communicated by Tomaso Poggio.

Appendix: Probability Tail Inequalities

The following probability tail inequalities are used in our analysis. These specific inequalities were chosen in order to satisfy the general conditions set up in Sect. 2.4; however, our analysis can specialize or generalize with the availability of other tail inequalities of these sorts.

The first tail inequality is for positive semidefinite quadratic forms of a sub-Gaussian random vector. It generalizes a standard tail inequality for Gaussian random vectors based on linear combinations of $\chi ^2$ random variables [15].

Lemma 8

(Quadratic forms of a sub-Gaussian random vector [11]) Let $\xi $ be a random vector taking values in $\mathbb {R}^n$ such that for some $c \ge 0$,

$$\begin{aligned} \mathbb {E}[\exp (\langle u,\xi \rangle )] \le \exp (c \Vert u\Vert ^2 / 2), \quad \forall u \in \mathbb {R}^n. \end{aligned}$$

For all symmetric positive semidefinite matrices $K \succeq 0$, and all $t > 0$,

$$\begin{aligned} \Pr \biggl [ \xi ^{\scriptscriptstyle \top }K \xi > c \Bigl ( {{\mathrm{tr}}}(K) + 2\sqrt{{{\mathrm{tr}}}(K^2)t} + 2\Vert K\Vert t \Bigr ) \biggr ] \le \mathrm {e}^{-t}. \end{aligned}$$

The next lemma is a tail inequality for sums of bounded random vectors; it is a standard application of Bernstein’s inequality.

Lemma 9

(Vector Bernstein bound; see, e.g., [11]) Let $x_1,x_2,\cdots ,x_n$ be independent random vectors such that

$$\begin{aligned} \sum _{i=1}^n \mathbb {E}[ \Vert x_i\Vert ^2 ] \le v \quad \text {and} \quad \Vert x_i\Vert \le r \end{aligned}$$

for all $i=1,2,\cdots ,n$, almost surely. Let $s := x_1 + x_2 + \cdots + x_n$. For all $t > 0$,

$$\begin{aligned} \Pr \left[ \Vert s\Vert > \sqrt{v} (1 + \sqrt{8t}) + (4/3) r t \right] \le \mathrm {e}^{-t}. \end{aligned}$$

The last tail inequality concerns the spectral accuracy of an empirical second moment matrix.

Lemma 10

(Matrix Bernstein bound [12]) Let $X$ be a random matrix, and $r > 0$, $v > 0$, and $k > 0$ be such that, almost surely,

$$\begin{aligned} \mathbb {E}[X] = 0 , \quad \lambda _{\max }[X] \le r , \quad \lambda _{\max }[\mathbb {E}[X^2]] \le v , \quad {{\mathrm{tr}}}(\mathbb {E}[X^2]) \le v k. \end{aligned}$$

If $X_1,X_2,\cdots ,X_n$ are independent copies of $X$, then for any $t > 0$,

$$\begin{aligned} \Pr \left[ \lambda _{\max }\left[ \frac{1}{n} \sum _{i=1}^n X_i \right] > \sqrt{\frac{2vt}{n}} + \frac{rt}{3n} \right] \le k t (\mathrm {e}^t - t - 1)^{-1}. \end{aligned}$$

If $t \ge 2.6$, then $t (\mathrm {e}^t - t - 1)^{-1} \le \mathrm {e}^{-t/2}$.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hsu, D., Kakade, S.M. & Zhang, T. Random Design Analysis of Ridge Regression. Found Comput Math 14, 569–600 (2014). https://doi.org/10.1007/s10208-014-9192-1

Download citation

Received: 11 June 2012
Revised: 18 January 2014
Accepted: 12 February 2014
Published: 22 March 2014
Issue Date: June 2014
DOI: https://doi.org/10.1007/s10208-014-9192-1

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Random Design Analysis of Ridge Regression

Abstract

Access this article

Similar content being viewed by others

Sparse Covariance and Precision Random Design Regression

High-Dimensional Regression Under Correlated Design: An Extensive Simulation Study

Robust Optimal Design When Missing Data Happen at Random

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix: Probability Tail Inequalities

Lemma 8

Lemma 9

Lemma 10

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Random Design Analysis of Ridge Regression

Abstract

Access this article

Similar content being viewed by others

Sparse Covariance and Precision Random Design Regression

High-Dimensional Regression Under Correlated Design: An Extensive Simulation Study

Robust Optimal Design When Missing Data Happen at Random

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix: Probability Tail Inequalities

Appendix: Probability Tail Inequalities

Lemma 8

Lemma 9

Lemma 10

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation