Asymptotic bias of the $$\ell _2$$ -regularized error variance estimator

Choi, Semin; Park, Gunwoong

doi:10.1007/s42952-023-00239-y

Asymptotic bias of the $\ell _2$-regularized error variance estimator

Research Article
Published: 14 November 2023

Volume 53, pages 132–148, (2024)
Cite this article

Journal of the Korean Statistical Society Aims and scope Submit manuscript

Semin Choi¹ &
Gunwoong Park^2,3

167 Accesses
Explore all metrics

Abstract

This study considers the $\ell _2$-regularized error variance estimator, an effective tool for non-sparse linear models. Specifically, this study investigates previously unexplored theoretical properties of this estimator, particularly its asymptotic bias in high dimensional settings where the number of variables increases with increase in the number of observations. This study proves that the estimator is asymptotically unbiased in large-sample settings ($\lim p/n <1$), while it is asymptotically biased in general ultra-high dimensional settings ($\lim p/n >1$). Specifically, the asymptotic bias of the $\ell _2$-regularized estimator in ultra-high dimensional settings is accurately derived where the covariates are independent.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust and sparse regression in generalized linear model by stochastic optimization

Article 11 June 2019

Sharp non-asymptotic performance bounds for $$\ell _1$$ and Huber robust regression estimators

Article 14 March 2015

M-estimation in high-dimensional linear model

Article Open access 30 August 2018

References

Bai, Z. D., & Yin, Y. Q. (1993). Limit of the smallest eigenvalue of a large dimensional sample covariance matrix. The Annals of Probability, 21(3), 1275–1294. https://doi.org/10.1214/aop/1176989118
Article MathSciNet Google Scholar
Dicker, L. H. (2014). Variance estimation in high-dimensional linear models. Biometrika, 101(2), 269–284.
Article MathSciNet Google Scholar
Dicker, L. H., & Erdogdu, M. A. (2016). Maximum likelihood for variance estimation in high-dimensional linear models. In Artificial intelligence and statistics, PMLR (pp. 159–167).
Dobriban, E., Wager, S., et al. (2018). High-dimensional asymptotics of prediction: Ridge regression and classification. The Annals of Statistics, 46(1), 247–279.
Article MathSciNet Google Scholar
Fan, J., Guo, S., & Hao, N. (2012). Variance estimation using refitted cross-validation in ultrahigh dimensional regression. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 74(1), 37–65.
Article MathSciNet PubMed Google Scholar
Greenshtein, E., & Ritov, Y. (2004). Persistence in high-dimensional linear predictor selection and the virtue of over parametrization. Bernoulli, 10(6), 971–988.
Article MathSciNet Google Scholar
Janson, L., Barber, R. F., & Candes, E. (2017). Eigenprism: Inference for high dimensional signal-to-noise ratios. Journal of the Royal Statistical Society Series B, Statistical Methodology, 79(4), 1037.
Article MathSciNet PubMed Google Scholar
Javanmard, A., & Montanari, A. (2014). Confidence intervals and hypothesis testing for high-dimensional regression. Journal of Machine Learning Research, 15(82), 2869–2909.
MathSciNet Google Scholar
Liu, X., Zheng, S., & Feng, X. (2020). Estimation of error variance via ridge regression. Biometrika, 107(2), 481–488.
MathSciNet Google Scholar
Marchenko, V. A., & Pastur, L. A. (1967). Distribution of eigenvalues for some sets of random matrices. Matematicheskii Sbornik, 114(4), 507–536.
Google Scholar
Ning, Y., & Liu, H. (2017). A general theory of hypothesis tests and confidence regions for sparse high dimensional models. The Annals of Statistics, 45(1), 158–195. https://doi.org/10.1214/16-AOS1448
Article MathSciNet Google Scholar
Park, G., Moon, S. J., Park, S., et al. (2021). Learning a high-dimensional linear structural equation model via $\ell _1$-regularized regression. The Journal of Machine Learning Research, 22(1), 4607–4647.
MathSciNet Google Scholar
Reid, S., Tibshirani, R., & Friedman, J. (2016). A study of error variance estimation in lasso regression. Statistica Sinica, 26(1), 35–67.
MathSciNet Google Scholar
Rubio, F., & Mestre, X. (2011). Spectral convergence for a general class of random matrices. Statistics and Probability Letters, 81(5), 592–602.
Article MathSciNet Google Scholar
Silverstein, J. W. (1995). Strong convergence of the empirical distribution of eigenvalues of large dimensional random matrices. Journal of Multivariate Analysis, 55(2), 331–339.
Article MathSciNet Google Scholar
Städler, N., Bühlmann, P., & Van De Geer, S. (2010). $\ell _1$-penalization for mixture regression models. Test, 19, 209–256.
Article MathSciNet Google Scholar
Sun, T., & Zhang, C. H. (2012). Scaled sparse linear regression. Biometrika, 99(4), 879–898.
Article MathSciNet Google Scholar
Sun, T., & Zhang, C. H. (2013). Sparse matrix inversion with scaled lasso. The Journal of Machine Learning Research, 14(1), 3385–3418.
MathSciNet Google Scholar
van de Geer, S., Bühlmann, P., Ritov, Y., et al. (2014). On asymptotically optimal confidence regions and tests for high-dimensional models. The Annals of Statistics, 42(3), 1166–1202. https://doi.org/10.1214/14-AOS1221
Article MathSciNet Google Scholar
Verzelen, N., & Gassiat, E. (2018). Adaptive estimation of high-dimensional signal-to-noise ratios. Bernoulli, 24(4B), 3683–3710. https://doi.org/10.3150/17-BEJ975
Article MathSciNet Google Scholar
Wang, X., Kong, L., & Wang, L. (2022). Estimation of error variance in regularized regression models via adaptive lasso. Mathematics, 10(11), 1937.
Article Google Scholar
Yu, G., & Bien, J. (2019). Estimating the error variance in a high-dimensional linear model. Biometrika, 106(3), 533–546.
Article MathSciNet Google Scholar
Zhang, C. H., & Zhang, S. S. (2014). Confidence intervals for low dimensional parameters in high dimensional linear models. Journal of the Royal Statistical Society Series B (Statistical Methodology), 76(1), 217–242.
Article MathSciNet Google Scholar

Download references

Acknowledgements

This work was supported by the National Research Foundation of Korea(NRF) Grant funded by the Korea government(MSIT) (No. NRF-2021R1C1C2006380) for Semin Choi. Also, this work was supported by the National Research Foundation of Korea(NRF) Grant funded by the Korea government(MSIT) (NRF-2021R1C1C1004562 and RS-2023-00218231) and Institute of Information & communications Technology Planning & Evaluation (IITP) Grant funded by the Korea government(MSIT) [No. 2021-0-01343, Artificial Intelligence Graduate School Program (Seoul National University)] for Gunwoong Park.

Author information

Authors and Affiliations

School of Cross-Disciplinary Studies, University of Seoul, Seoul, Republic of Korea
Semin Choi
Department of Statistics, Seoul National University, Seoul, Republic of Korea
Gunwoong Park
Interdisciplinary Program in Artificial Intelligence, Seoul National University, Seoul, Republic of Korea
Gunwoong Park

Authors

Semin Choi
View author publications
You can also search for this author in PubMed Google Scholar
Gunwoong Park
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gunwoong Park.

Ethics declarations

Conflict of interest

All authors have no conflict of interests to declare that are relevant to the content of this article.

Appendices

Appendix 1: Proof for Theorem 3

Proof

We begin by proving the first theoretical result with general $\Sigma$. For ease of notation, we use $\lambda = \lambda _{n}$ as a shorthand. Recall that the bias of the estimator (6) is expressed as follows:

$$\begin{aligned} \frac{{\textrm{E}}\left( {\hat{\sigma }}_\lambda ^2\right) - \sigma ^2}{\Vert {{\varvec{\beta }}}\Vert _2^2} = \frac{\lambda }{1-n^{-1}{\text{ tr }}\left( H_n^{\lambda _n} \right) } \cdot \frac{1}{\Vert {{\varvec{\beta }}}\Vert _2^2} {{\varvec{\beta }}}^T S_n (S_n+\lambda I_p)^{-1} {{\varvec{\beta }}}. \end{aligned}$$

(8)

Simple algebra yields that the denominator of the first term in Eq. (8) is as follows:

$$\begin{aligned} 1- \frac{{\text {tr}}(H_n^{\lambda _n})}{n} = 1- \frac{{\text {tr}}\left( \frac{1}{n}X(S_n + \lambda I_p)^{-1}X^T\right) }{n} = 1-\frac{p}{n} + \frac{p}{n} \cdot \frac{\lambda }{p} {\text {tr}}\bigg (\Big (S_n + \lambda I_p\Big )^{-1}\bigg ). \end{aligned}$$

Meanwhile, for any fixed $\lambda _0 \in {\mathbb {R}}^+$, the equations (2) and (3) in Dobriban et al. (2018) imply that

$$\begin{aligned} \lim _{n \rightarrow \infty } \frac{\lambda _0}{p} {\text{ tr }}\bigg (\Big (S_n + \lambda _0 I_p\Big )^{-1}\bigg ) = \frac{1}{\tau } \lambda _0 v(-\lambda _0) - \frac{1}{\tau } + 1 \qquad a.s. \end{aligned}$$

(9)

For ease of notation, let $f_n(\lambda ) := \frac{1}{\lambda }\left\{ \frac{n}{p} - 1 + \frac{\lambda }{p} {\text{ tr }}\left( \left( S_n + \lambda I_p\right) ^{-1}\right) \right\}$. Then, it is noted that $\lim _{n\rightarrow \infty } f_n(\lambda _0) = \frac{v(-\lambda _0)}{\tau }$ for each $\lambda _0 \in {\mathbb {R}}^+$. For the existence of $\lim _{n\rightarrow \infty } f_n(\lambda _n)$, it suffices to show that the sequence $f_n$, $n=1,2,\ldots$ is uniformly convergent on ${\mathbb {R}}^+$ by the Moore–Osgood theorem.

Since the nonzero eigenvalues of $S_n$ and $\frac{1}{n}XX^T$ are equal where $S_n$ has 0 as an additional eigenvalue with multiplicity $(p-n)$, we have

$$\begin{aligned} {\text{ tr }}\left( \left( S_n + \lambda I_p\right) ^{-1}\right) = \frac{p-n}{\lambda } + {\text{ tr }}\left( \left( \frac{1}{n}XX^T + \lambda I_n\right) ^{-1}\right) . \end{aligned}$$

Simple algebra yields that

$$\begin{aligned} f_n(\lambda ) = \frac{1}{\lambda }\left\{ \frac{n}{p} - 1 + \frac{\lambda }{p} {\text{ tr }}\left( \left( S_n + \lambda I_p\right) ^{-1}\right) \right\} = \frac{1}{p}{\text{ tr }}\left( \left( \frac{1}{n}XX^T + \lambda I_n\right) ^{-1}\right) . \end{aligned}$$

Hence, we have

$$\begin{aligned} |f_n(\lambda _0)|\le \left|\frac{n}{p\cdot \Lambda _{\min }^+(S_n)}\right|\le \left|\frac{n}{p\cdot \Lambda _{\min }^+ (Z^TZ/n) \Lambda _{\min }(\Sigma )}\right|\le \frac{1}{2\tau }\cdot \frac{2}{\rho _{\min }(1-\sqrt{\tau })^2} \end{aligned}$$

for any fixed $\lambda _0 \in {\mathbb {R}}^+$ and sufficiently all large n, where $\Lambda _{\min }^+(\cdot )$ denotes the smallest nonzero eigenvalue. Here, the last inequality follows from the fact that $\lim _{n\rightarrow \infty }\Lambda _{\min }^+ (Z^TZ/n) = (1-\sqrt{\tau })^2$ by Theorem 2 in Bai and Yin (1993).

Similarly, for the derivative $f_n'(\lambda )=\partial f_n(\lambda )/\partial \lambda$, it is also observed that

$$\begin{aligned} |f_n'(\lambda _0) |=\left|\frac{1}{p}{\text{ tr }}\left( \left( \frac{1}{n}XX^T + \lambda _0 I_n\right) ^{-2}\right) \right|\le \left|\frac{n}{p\cdot (\Lambda _{\min }^+ (S_n))^2}\right|\le \frac{1}{2\tau }\cdot \frac{4}{\rho _{\min }^2(1-\sqrt{\tau })^4}, \end{aligned}$$

for any fixed $\lambda _0 \in {\mathbb {R}}^+$ and sufficiently large n.

Hence, $f_n$ and its derivative are uniformly bounded, and hence the sequence $f_n$, $n=1,2,\ldots$ is uniformly convergent by the Arzela–Ascoli theorem. Consequently, it concludes that $\lim _{n\rightarrow \infty } f_n(\lambda _n)$ exists. In addition, according to Lemma 2.3 in Dobriban et al. (2018), $\lim _{n\rightarrow \infty } v(-\lambda _n) = v(-c)$ for $\lim \lambda _n = c$, and hence, we have

$$\begin{aligned} \lim _{n \rightarrow \infty }\frac{1}{\lambda _n}\bigg \{\frac{n}{p} - 1 + \frac{\lambda _n}{p} {\text{ tr }}\bigg (\Big (S_n + \lambda _n I_p\Big )^{-1}\bigg )\bigg \} = \frac{v(-c)}{\tau } \qquad a.s. \end{aligned}$$

for all $c \in [0,\infty )$.

Multiplying $\tau$ on both sides, the limit of the first term of the bias is as follows:

$$\begin{aligned} \lim _{n\rightarrow \infty } \frac{\lambda _n}{1-n^{-1}{\text{ tr }}\left( H_n^{\lambda _n}\right) } = \frac{1}{v(-c)} \qquad a.s. \end{aligned}$$

(10)

Now, it is focused on the second part of the bias, $\frac{1}{\Vert {{\varvec{\beta }}}\Vert _2^2} {{\varvec{\beta }}}^T S_n (S_n+\lambda _n I_p)^{-1} {{\varvec{\beta }}}$. As shown above, $S_n$ has 0 as an eigenvalue with multiplicity $(p-n)$ and $\Lambda _{\min }^+(S_n) \ge (1-\sqrt{\tau })^2\rho _{\min }/2$ for sufficiently all large n. Define $\{e_1,\ldots ,e_{n}\}$ as the eigenvectors of $S_n$ corresponding to the nonzero eigenvalues and $\{e_{n+1},\ldots ,e_{p}\}$ as the eigenvectors of $S_n$ corresponding to the eigenvalue 0. Then, ${{\varvec{\beta }}}$ is decomposed as ${{\varvec{\beta }}}= {{\varvec{\beta }}}_{E_1} + {{\varvec{\beta }}}_{E_2}$ where $E_1$ and $E_2$ are subspaces of ${\mathbb {R}}^p$ spanned by $\{e_1,\ldots ,e_{n}\}$ and $\{e_{n+1},\ldots ,e_{p}\}$, respectively. Hence, we have

$$\begin{aligned} \frac{1}{\Vert {{\varvec{\beta }}}\Vert _2^2}{{\varvec{\beta }}}^T S_n (S_n+\lambda _n I_p)^{-1} {{\varvec{\beta }}}&= \frac{1}{\Vert {{\varvec{\beta }}}\Vert _2^2}{{\varvec{\beta }}}_{E_1}^T S_n (S_n+\lambda _n I_p)^{-1} {{\varvec{\beta }}}_{E_1} \\&\ge \frac{\Vert {{\varvec{\beta }}}_{E_1}\Vert _2^2}{\Vert {{\varvec{\beta }}}\Vert _2^2} \cdot \frac{(1-\sqrt{\tau })^2\rho _{\min }/2}{\lambda _n+(1-\sqrt{\tau })^2\rho _{\min }/2} \qquad a.s. \end{aligned}$$

for all sufficiently large n.

Applying Eqs. (8) and (10), we obtain

$$\begin{aligned} \liminf _{n\rightarrow \infty } \frac{\left[ {\textrm{E}}\left( {\hat{\sigma }}_{\lambda _n}^2\right) - \sigma ^2\right] }{\Vert {{\varvec{\beta }}}\Vert _2^2} \ge \frac{1}{v(-c)} \cdot \frac{(1-\sqrt{\tau })^2\rho _{\min }/2}{c+(1-\sqrt{\tau })^2\rho _{\min }/2} \cdot \liminf _{n\rightarrow \infty } \frac{\Vert {{\varvec{\beta }}}_{E_1}\Vert _2^2}{\Vert {{\varvec{\beta }}}\Vert _2^2} \qquad a.s. \end{aligned}$$

Since $E_1$ is determined by the design matrix $X = Z\Sigma ^{1/2}$ under the setting where Z is randomly generated and $\Sigma$ has full rank, it holds that $\liminf _{n\rightarrow \infty } \frac{\Vert {{\varvec{\beta }}}_{E_1}\Vert _2^2}{\Vert {{\varvec{\beta }}}\Vert _2^2} > 0$ almost surely. This completes the proof of the theoretical result with general $\Sigma$. $\square$

Appendix 2: Proof for Theorem 4

Proof

Now, we prove the second theoretical result with $\Sigma = I_p$. Eq. (10) also holds for $\Sigma =I_p$, and hence it suffices to focus on the term $\frac{1}{\Vert {{\varvec{\beta }}}\Vert _2^2} {{\varvec{\beta }}}^T S_n (S_n+\lambda _n I_p)^{-1} {{\varvec{\beta }}}= 1 - \frac{\lambda _n}{\Vert {{\varvec{\beta }}}\Vert _2^2} {{\varvec{\beta }}}^T (S_n+\lambda _n I_p)^{-1} {{\varvec{\beta }}}$. First, we use Theorem 1 in Rubio and Mestre (2011) with $\Theta = I_p/p$ that implies

$$\begin{aligned} \lim _{n \rightarrow \infty } \left[ \frac{1}{p}{\text{ tr }}\left( (S_n+\lambda _0 I_p)^{-1} \right) - \frac{1}{p}{\text{ tr }}\left( (x_n(\lambda _0) I_p)^{-1})\right) \right] = 0 \qquad a.s., \end{aligned}$$

for any fixed $\lambda _0 \in {\mathbb {R}}^+$ where $x_n(\lambda _0)>0$ is the deterministic sequence satisfying a certain fixed-point equation for each n.

By Eq. (9), the following holds that

$$\begin{aligned} \lim _{n\rightarrow \infty } \lambda _0\left( x_n(\lambda _0)\right) ^{-1} = \lim _{n\rightarrow \infty }\frac{\lambda _0}{p}{\text{ tr }}\left( (S_n + \lambda _0 I_p)^{-1}\right) = \frac{1}{\tau } \lambda _0 v(-\lambda _0) - \frac{1}{\tau } + 1, \qquad {a.s.} \end{aligned}$$

for any fixed $\lambda _0 \in {\mathbb {R}}^+$, and hence we can define $x(\lambda _0): = \lim _{n\rightarrow \infty } x_n(\lambda _0)$.

Furthermore, by Theorem 1 in Rubio and Mestre (2011) with $\Theta = {{\varvec{\beta }}}{{\varvec{\beta }}}^T/\Vert {{\varvec{\beta }}}\Vert _2^2$,

$$\begin{aligned} \lim _{n\rightarrow \infty } \frac{1}{\Vert {{\varvec{\beta }}}\Vert _2^2}{{\varvec{\beta }}}^T(S_n+\lambda _0 I_p)^{-1}{{\varvec{\beta }}}- \frac{1}{\Vert {{\varvec{\beta }}}\Vert _2^2}{{\varvec{\beta }}}^T\left( x_n(\lambda _0) I_p\right) ^{-1}{{\varvec{\beta }}}= 0 \qquad a.s. \end{aligned}$$

for any fixed $\lambda _0 \in {\mathbb {R}}^+$, and it implies that

$$\begin{aligned} \lim _{n\rightarrow \infty } \frac{\lambda _0}{\Vert {{\varvec{\beta }}}\Vert _2^2}{{\varvec{\beta }}}^T(S_n+\lambda _0 I_p)^{-1}{{\varvec{\beta }}}= \lambda _0 x(\lambda _0)^{-1} \qquad a.s. \end{aligned}$$

for any fixed $\lambda _0 \in {\mathbb {R}}^+$. Following the same argument for Eq. (10), the existence of the limit of $g_n(\lambda _n) := \frac{\lambda _n}{\Vert {{\varvec{\beta }}}\Vert _2^2}{{\varvec{\beta }}}^T(S_n+\lambda _n I_p)^{-1}{{\varvec{\beta }}}=1-\frac{1}{\Vert {{\varvec{\beta }}}\Vert _2^2}{{\varvec{\beta }}}^TS_n(S_n+\lambda _n I_p)^{-1}{{\varvec{\beta }}}$ is confirmed by the following inequalities:

$$\begin{aligned} |g_n(\lambda _0)|&\le \left|\lambda _0 \cdot \Lambda _{\max } \left( (S_n+\lambda _0 I_p)^{-1}\right) \right|\le 1,\\ |g_n'(\lambda _0)|&= \left|\frac{1}{\Vert {{\varvec{\beta }}}\Vert _2^2}{{\varvec{\beta }}}^T S_n(S_n+\lambda _0 I_p)^{-2} {{\varvec{\beta }}}\right|\le \left|\Lambda _{\max } \left( S_n(S_n+\lambda _0 I_p)^{-2}\right) \right|\\&\le \frac{2}{\rho _{\min }(1-\sqrt{\tau })^2}, \end{aligned}$$

for any fixed $\lambda _0 \in {\mathbb {R}}^+$ and sufficiently all large n.

Then, it holds that

$$\begin{aligned} \lim _{n\rightarrow \infty } \frac{\lambda _n }{\Vert {{\varvec{\beta }}}\Vert _2^2}{{\varvec{\beta }}}^T(S_n+\lambda _n I_p)^{-1}{{\varvec{\beta }}}&= \lim _{n\rightarrow \infty } \lambda _n x(\lambda _n)^{-1} \\&= \lim _{n\rightarrow \infty } \left[ \frac{1}{\tau } \lambda _n v(-\lambda _n) - \frac{1}{\tau } + 1 \right] \\&= \frac{cv(-c)}{\tau }-\frac{1}{\tau } + 1 \qquad a.s. \end{aligned}$$

Hence, it is concluded that

$$\begin{aligned} \frac{\left[ {\textrm{E}}\left( {\hat{\sigma }}_{\lambda }^2\right) - \sigma ^2 \right] }{\Vert {{\varvec{\beta }}}\Vert _2^2}&= \frac{\lambda }{1-n^{-1}{\text{ tr }}\left( H_n^{\lambda _n} \right) } \cdot \frac{1}{\Vert {{\varvec{\beta }}}\Vert _2^2}{{\varvec{\beta }}}^T S_n (S_n+\lambda I_p)^{-1} {{\varvec{\beta }}}\\ {}&= \frac{\lambda }{1-n^{-1}{\text{ tr }}\left( H_n^{\lambda _n} \right) } \cdot \left[ \frac{\Vert {{\varvec{\beta }}}\Vert _2^2}{\Vert {{\varvec{\beta }}}\Vert _2^2} - \frac{\lambda }{\Vert {{\varvec{\beta }}}\Vert _2^2} {{\varvec{\beta }}}^T(S_n+\lambda I_p)^{-1}{{\varvec{\beta }}}\right] \\&\rightarrow \frac{1}{v(-c)} \cdot \left[ 1- \left( \frac{cv(-c)}{\tau }-\frac{1}{\tau } + 1\right) \right] = \frac{1}{\tau } \left( \frac{1}{v(-c)}-c\right) \qquad a.s. \end{aligned}$$

as $n\rightarrow \infty$. $\square$

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Choi, S., Park, G. Asymptotic bias of the $\ell _2$-regularized error variance estimator. J. Korean Stat. Soc. 53, 132–148 (2024). https://doi.org/10.1007/s42952-023-00239-y

Download citation

Received: 02 April 2023
Accepted: 19 October 2023
Published: 14 November 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s42952-023-00239-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Asymptotic bias of the \(\ell _2\)-regularized error variance estimator

Abstract

Access this article

Similar content being viewed by others

Robust and sparse regression in generalized linear model by stochastic optimization

Sharp non-asymptotic performance bounds for $$\ell _1$$ and Huber robust regression estimators

M-estimation in high-dimensional linear model

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Appendices

Appendix 1: Proof for Theorem 3

Proof

Appendix 2: Proof for Theorem 4

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Asymptotic bias of the \(\ell _2\)-regularized error variance estimator

Abstract

Access this article

Similar content being viewed by others

Robust and sparse regression in generalized linear model by stochastic optimization

Sharp non-asymptotic performance bounds for $$\ell _1$$ and Huber robust regression estimators

M-estimation in high-dimensional linear model

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Appendices

Appendix 1: Proof for Theorem 3

Proof

Appendix 2: Proof for Theorem 4

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation