Abstract
Sharp oracle inequalities for the prediction error and ℓ 1-error of the Lasso are given. We highlight the ingredients for establishing these. The latter is also for later reference where results are extended to other norms and other loss functions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
A suitable notation that expresses the non-uniqueness is β 0 ∈ argmin{ ∥ β ∥ 1: X β = f 0}. In our analysis, non-uniqueness is not a major concern.
- 2.
If X 1, …, X n are n elements of some space \(\mathcal{X}\) and \(f: \mathcal{X} \rightarrow \mathbb{R}\) is some real-valued function on \(\mathcal{X}\), one may view \(\sum _{i=1}^{n}f^{2}(X_{i})/n\) as the squared L 2(P n )-norm of f, with \(P_{n} =\sum _{ i=1}^{n}\delta _{X_{i}}/n\) being the measure that puts equal mass 1∕n at each X i (i = 1, …, n). Let us denote the L 2(P n )-norm by \(\|\cdot \|_{2,P_{n}}\). We have abbreviated this to \(\|\cdot \|_{P_{n}}\) and then further abbreviated it to ∥ ⋅ ∥ n . Finally, we identified f with the vector \((\,f(X_{1}),\ldots,f(X_{n}))^{T} \in \mathbb{R}^{n}\).
- 3.
Or non-sparsity actually.
- 4.
The “argmin” argument takes the inequality: \(\|Y - X\hat{\beta }\|_{n}^{2} + 2\lambda \|\hat{\beta }\|_{1} \leq \vert Y - X\beta \|_{n}^{2} + 2\lambda \|\beta \|_{1}\ \forall \ \beta\), as starting point.
References
P. Bickel, Y. Ritov, A. Tsybakov, Simultaneous analysis of Lasso and Dantzig selector. Ann. Stat. 37, 1705–1732 (2009)
P. Bühlmann, S. van de Geer, Statistics for High-Dimensional Data: Methods, Theory and Applications (Springer, Heidelberg, 2011)
G. Chen, M. Teboulle, Convergence analysis of a proximal-like minimization algorithm using Bregman functions. SIAM J. Optim. 3, 538–543 (1993)
S.S. Chen, D.L. Donoho, M.A. Saunders, Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20, 33–61 (1998)
O. Güler, On the convergence of the proximal point algorithm for convex minimization. SIAM J. Control Optim. 29, 403–419 (1991)
V. Koltchinskii, Oracle Inequalities in Empirical Risk Minimization and Sparse Recovery Problems: École d’Été de Probabilités de Saint-Flour XXXVIII-2008, vol. 38 (Springer, Heidelberg, 2011)
V. Koltchinskii, K. Lounici, A. Tsybakov, Nuclear-norm penalization and optimal rates for noisy low-rank matrix completion. Ann. Stat. 39, 2302–2329 (2011)
R. Tibshirani, Regression analysis and selection via the Lasso. J. R. Stat. Soc. Ser. B 58, 267–288 (1996)
S. van de Geer, Least squares estimation with complexity penalties. Math. Methods Stat. 10, 355–374 (2001)
S. van de Geer, The deterministic Lasso, in JSM Proceedings, 2007, 140 (American Statistical Association, Alexandria, 2007)
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
van de Geer, S. (2016). The Lasso. In: Estimation and Testing Under Sparsity. Lecture Notes in Mathematics(), vol 2159. Springer, Cham. https://doi.org/10.1007/978-3-319-32774-7_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-32774-7_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-32773-0
Online ISBN: 978-3-319-32774-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)