Robust estimation and shrinkage in ultrahigh dimensional expectile regression with heavy tails and variance heterogeneity

Zhao, Jun; Yan, Guan’ao; Zhang, Yi

doi:10.1007/s00362-021-01227-2

Robust estimation and shrinkage in ultrahigh dimensional expectile regression with heavy tails and variance heterogeneity

Regular Article
Published: 11 March 2021

Volume 63, pages 1–28, (2022)
Cite this article

Statistical Papers Aims and scope Submit manuscript

Jun Zhao¹,
Guan’ao Yan² &
Yi Zhang²

677 Accesses
5 Citations
Explore all metrics

Abstract

High-dimensional data subject to heavy-tailed phenomena and heterogeneity are commonly encountered in various scientific fields and bring new challenges to the classical statistical methods. In this paper, we combine the asymmetric square loss and huber-type robust technique to develop the robust expectile regression for ultrahigh dimensional heavy-tailed heterogeneous data. Different from the classical huber method, we introduce two different tuning parameters on both sides to account for possibly asymmetry and allow them to diverge to reduce bias induced by the robust approximation. In the regularized framework, we adopt the generally folded concave penalty function like the SCAD or MCP penalty for the seek of bias reduction. We investigate the finite sample property of the corresponding estimator and figure out how our method plays its role to trade off the estimation accuracy against the heavy-tailed distribution. Also, based on our theoretical study, we propose an efficient first-order optimization algorithm after locally linear approximation of the non-convex problem. Simulation studies under various distributions and a real data example demonstrate the satisfactory performances of our method in coefficient estimation, model selection and heterogeneity detection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Local Walsh-average-based estimation and variable selection for single-index models

Article 06 September 2019

Walsh-average based variable selection for varying coefficient models

Article 18 June 2014

Shrinkage and Sparse Estimation for High-Dimensional Linear Models

References

Agarwal A, Negahban S, Wainwright MJ (2012) Fast global convergence of gradient methods for high-dimensional statistical recovery. The Annals of Statistics 40(5):2452–2482
Article MathSciNet Google Scholar
Aigner DJ, Amemiya T, Poirier DJ (1976) On the estimation of production frontiers: maximum likelihood estimation of the parameters of a discontinuous density function. International Economic Review 17(2):377–396
Article MathSciNet Google Scholar
Aitkin M (1987) Modelling variance heterogeneity in normal regression using GLIM. Journal of the Royal Statistical Society: Series C (Applied Statistics) 36(3):332–339
Google Scholar
Bickel PJ (1984) Robust regression based on infinitesimal neighbourhoods. The Annals of Statistics 12:1349–1368
MathSciNet MATH Google Scholar
De Rossi G, Harvey A (2009) Quantiles, expectiles and splines. Journal of Econometrics 152(2):179–185
Article MathSciNet Google Scholar
Efron B (1991) Regression percentiles using asymmetric squared error loss. Statistica Sinica 93-125
Fan J, Li Q, Wang Y (2017) Estimation of high dimensional mean regression in the absence of symmetry and light tail assumptions. Journal of the Royal Statistical Society 79(1):247–265
Article MathSciNet Google Scholar
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American statistical Association 96(456):1348–1360
Article MathSciNet Google Scholar
Fan J, Xue L, Zou H (2014) Strong oracle optimality of folded concave penalized estimation. The Annals of statistics 42(3):819
MathSciNet MATH Google Scholar
Fu A, Narasimhan B, Boyd S (2017) CVXR: An R package for disciplined convex optimization. https://web.stanford.edu/~boyd/papers/cvxr_paper.html
Gu Y, Zou H (2016) High-dimensional generalizations of asymmetric least squares regression and their applications. The Annals of Statistics 44(6):2661–2694
Article MathSciNet Google Scholar
Guo C, Yang H, Lv J (2017) Robust variable selection in high-dimensional varying coefficient models based on weighted composite quantile regression. Statistical Papers 58(4):1009–1033
Article MathSciNet Google Scholar
Grant MC, Boyd SP (2013) CVX: Matlab software for disciplined convex programming, version 2.0 beta. http://cvxr.com/cvx
Grant MC, Boyd SP (2008) Graph implementations for nonsmooth convex programs. InRecent advances in learning and control 95-110
Huang CC, Liu K, Pope RM, Du P, Lin S, Rajamannan NM et al (2011) Activated tlr signaling in atherosclerosis among women with lower framingham risk score: the multi-ethnic study of atherosclerosis. Plos One 6(6):e21067
Huber PJ (1964) Robust Estimation of a Location Parameter. The Annals of Mathematical Statistics 35:73–101
Article MathSciNet Google Scholar
Huber PJ (1983) Minimax aspects of bounded-influence regression. Journal of the American Statistical Association 78(381):66–72
Article MathSciNet Google Scholar
Kim M, Lee S (2016) Nonlinear expectile regression with application to value-at-risk and expected shortfall estimation. Computational Statistics & Data Analysis 94:1–19
Article MathSciNet Google Scholar
Liu Y, Zeng P, Lin L (2020) Degrees of freedom for regularized regression with Huber loss and linear constraints. Statistical Papers 1-23
Loh PL, Wainwright MJ (2015) Regularized M-estimators with Nonconvexity: Statistical and Algorithmic Theory for Local Optima. Journal of Machine Learning Research 16:559–616
MathSciNet MATH Google Scholar
Maronna RA, Martin RD, Yohai VJ (2019) Robust statistics: theory and methods (with R). John Wiley & Sons,
Massart P (2007) Concentration inequalities and model selection. Springer, Berlin
MATH Google Scholar
Newey WK, Powell JL (1987) Asymmetric least squares estimation and testing. Econometrica: Journal of the Econometric Society 819-847
Parikh N, Boyd S (2014) Proximal algorithms. Foundations and Trends in Optimization 1(3):127–239
Article Google Scholar
Rigby RA, Stasinopoulos DM (1996) A semi-parametric additive model for variance heterogeneity. Statistics and Computing 6(1):57–65
Article Google Scholar
Rivasplata O (2012) Subgaussian random variables: An expository note. Internet publication, PDF
Google Scholar
Smucler E, Yohai VJ (2017) Robust and sparse estimators for linear regression models. Computational Statistics & Data Analysis 111:116–130
Article MathSciNet Google Scholar
Sobotka F, Kneib T (2012) Geoadditive expectile regression. Computational Statistics & Data Analysis 56(4):755–767
Article MathSciNet Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological) 58(1): 267-288
Waltrup LS, Sobotka F, Kneib T, Kauermann G (2015) Expectile and quantile regression-David and Goliath? Statistical Modelling 15(5):433–456
Article MathSciNet Google Scholar
Wang L, Wu Y, Li R (2012) Quantile regression for analyzing heterogeneity in ultra-high dimension. Journal of the American Statistical Association 107(497):214–222
Article MathSciNet Google Scholar
Wang L, Zheng C, Zhou W, Zhou W (2020) A New Principle for Tuning-Free Huber Regression. Statistica Sinica
Yao Q, Tong H (1996) Asymmetric least squares regression estimation: a nonparametric approach. Journal of nonparametric statistics 6(2–3):273–292
Article MathSciNet Google Scholar
Zhang CH (2010) Nearly unbiased variable selection under minimax concave penalty. The Annals of statistics 38(2):894–942
Article MathSciNet Google Scholar
Zhao J, Chen Y, Zhang Y (2018) Expectile regression for analyzing heteroscedasticity in high dimension. Statistics & Probability Letters 137: 304-311
Zou H, Li R (2008) One-step sparse estimates in nonconcave penalized likelihood models. The Annals of statistics 36(4):1509–1533
MathSciNet MATH Google Scholar
Ziegel JF (2016) Coherence and elicitability. Mathematical Finance 26(4):901–918
Article MathSciNet Google Scholar

Download references

Acknowledgements

The authors thank the Editor and anonymous referees for their valuable comments and suggestions. This research is partly supported by National Statistical Science Research Project (No. 2018LY30), Zhejiang Provincial Natural Science Foundation (No: LY18A010005) and the Research Project of Humanities and Social Science of Ministry of Education of China (No. 17YJA910003).

Author information

Authors and Affiliations

School of Computer and Computing Science, Zhejiang University City College, Zhejiang, China
Jun Zhao
School of Mathematical Sciences, Zhejiang University, Zhejiang, China
Guan’ao Yan & Yi Zhang

Authors

Jun Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Guan’ao Yan
View author publications
You can also search for this author in PubMed Google Scholar
Yi Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yi Zhang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Lemma 1

The loss function $\phi (\cdot )$ defined in (2.1) is continuous differentiable. Moreover, for any $r,r_0\in \mathbb {R}$, we have

$$\begin{aligned} \min \{\alpha ,1-\alpha \}\cdot (r-r_0)^2\le \phi _{\alpha }(r)-\phi _{\alpha }(r_0)-\phi '_{\alpha }(r_0)\cdot (r-r_0)\le \max \{\alpha ,1-\alpha \}\cdot (r-r_0)^2. \end{aligned}$$

(7.1)

Proof

Details can be found in Gu and Zou (2016).

Proof of Theorem 1

For simplicity and convenience in notation, we omit the notation dependence with the pre-determined parameters $C_u,~C_l$ and denote by $\tilde{\varvec{\beta }^*}:=\varvec{\beta }^*(C_u,C_l)$ for short.

By Eq. (3.1) and Lemma 1,

$$\begin{aligned} \mathbb {E}[\phi _{\alpha }(y-\mathbf {x}'{\tilde{\varvec{\beta }}}^*)-\phi _{\alpha }(y-\mathbf {x}'\varvec{\beta }^*)]\ge & {} \min \{\alpha ,1-\alpha \}\mathbb {E}[\mathbf {x}'{\tilde{\varvec{\beta }}}^*-\mathbf {x}'\varvec{\beta }^*]^2\nonumber \\= & {} \min \{\alpha ,1-\alpha \}({\tilde{\varvec{\beta }}}^*-\varvec{\beta }^*)'\mathbb {E}(\mathbf {x}\mathbf {x}')({\tilde{\varvec{\beta }}}^*-\varvec{\beta }^*)\nonumber \\\ge & {} \min \{\alpha ,1-\alpha \}\kappa _l\parallel {\tilde{\varvec{\beta }}}^*-\varvec{\beta }^*\parallel _2^2 \end{aligned}$$

(7.2)

where the last inequality follows by Condition 2.

On the other hand, ${\tilde{\varvec{\beta }}}^*=\underset{\varvec{\beta } \in \mathbb {R}^{p}}{\arg \min }~\mathbb {E}\psi _{\alpha }(y-\mathbf {x}'\varvec{\beta };C_u,C_l)$. Then

$$\begin{aligned}&\mathbb {E}[\phi _{\alpha }(y-\mathbf {x}'{\tilde{\varvec{\beta }}}^*)-\phi _{\alpha }(y-\mathbf {x}'\varvec{\beta }^*)]\\&\quad =\mathbb {E}[\phi _{\alpha }(y-\mathbf {x}'{\tilde{\varvec{\beta }}}^*)-\psi _{\alpha }(y-\mathbf {x}'{\tilde{\varvec{\beta }}}^*)]\\&\qquad +\;\mathbb {E}[\psi _{\alpha }(y-\mathbf {x}'{\tilde{\varvec{\beta }}}^*)-\psi _{\alpha }(y-\mathbf {x}'\varvec{\beta }^*)]\\&\qquad -\;\mathbb {E}[\phi _{\alpha }(y-\mathbf {x}'\varvec{\beta }^*)-\psi _{\alpha }(y-\mathbf {x}'\varvec{\beta }^*)]\\&\quad \le \mathbb {E}[g_{\alpha }(y-\mathbf {x}'{\tilde{\varvec{\beta }}}^*)-g_{\alpha }(y-\mathbf {x}'\varvec{\beta }^*)], \end{aligned}$$

where $g_{\alpha }(\cdot )$ is defined as follows

$$\begin{aligned} g_{\alpha }(r)=\phi _{\alpha }(r)-\psi _{\alpha }(r)=\alpha (r-C_u)^2\mathbb {I}(r\ge C_u)+(1-\alpha )(r-C_l)^2\mathbb {I}(r\le C_l). \end{aligned}$$

(7.3)

Note $g_{\alpha }(r)$ is continuous and differentiable and

$$\begin{aligned} g'_{\alpha }(r)=2\alpha (r-C_u)\mathbb {I}(r\ge C_u)+2(1-\alpha )(r-C_l)\mathbb {I}(r\le C_l). \end{aligned}$$

So by the mean value theorem, there exists some ${\tilde{\varvec{\beta }}}$ on the line segment between ${\tilde{\varvec{\beta }}}^*$ and $\varvec{\beta }^*$ such that

$$\begin{aligned}&\left| \mathbb {E}[g_{\alpha }(y-\mathbf {x}'{\tilde{\varvec{\beta }}}^*)-g_{\alpha }(y-\mathbf {x}'\varvec{\beta }^*)]\right| \\&\quad = \mathbb {E}\left[ |g_{\alpha }'(y-\mathbf {x}{\tilde{\varvec{\beta }}})|\times |\mathbf {x}'({\tilde{\varvec{\beta }}}^*-\varvec{\beta }^*)|\right] \\&\quad \le 2\max \{\alpha ,1-\alpha \}\mathbb {E}\left[ ({\tilde{r}}-C)\mathbb {I}({\tilde{r}}\ge C)\times |\mathbf {x}'({\tilde{\varvec{\beta }}}^*-\varvec{\beta }^*)|\right] \end{aligned}$$

where ${\tilde{r}}=|y-\mathbf {x}'{\tilde{\varvec{\beta }}}|$ and $C=\min \{C_u,|C_l|\}$.

Denote by $P_{\epsilon }$ the conditional distribution of $\epsilon $ on $\mathbf {x}$ and $\mathbb {E}_{\epsilon }$ the corresponding conditional expectation, we have

$$\begin{aligned} \mathbb {E}_{\epsilon }({\tilde{r}}-C)\mathbb {I}({\tilde{r}}\ge C)= & {} \int _{0}^{\infty }P_{\epsilon }({\tilde{r}}\mathbb {I}({\tilde{r}}\ge C)>t)dt-CP_{\epsilon }({\tilde{r}}\ge C)\nonumber \\= & {} \int _{0}^{\infty }P_{\epsilon }({\tilde{r}}>t,~{\tilde{r}}>C)dt-CP_{\epsilon }({\tilde{r}}\ge C)\nonumber \\= & {} \int _{C}^{\infty }P_{\epsilon }({\tilde{r}}>t)dt+\int _{0}^{C}P_{\epsilon }({\tilde{r}}>C)dt-CP_{\epsilon }({\tilde{r}}\ge C)\nonumber \\\le & {} \int _{C}^{\infty }\frac{\mathbb {E}_{\epsilon }[{\tilde{r}}]^k}{t^k}dt=\frac{1}{k-1}C^{1-k}\mathbb {E}_{\epsilon }[{\tilde{r}}]^k \end{aligned}$$

(7.4)

where the second to last inequality is obtained by Markov inequality.

Therefore, $\mathbb {E}[\phi _{\alpha }(y-\mathbf {x}'{\tilde{\varvec{\beta }}}^*)-\phi _{\alpha }(y-\mathbf {x}'\varvec{\beta }^*)]$ is further bounded by

$$\begin{aligned}&\frac{2}{k-1}C^{1-k}\mathbb {E}\left[ |y-\mathbf {x}'{\tilde{\varvec{\beta }}}|^k|\mathbf {x}'({\tilde{\varvec{\beta }}}^*-\varvec{\beta }^*)|\right] \nonumber \\&\quad =\frac{2}{k-1}C^{1-k}\mathbb {E}\left[ |\epsilon +\mathbf {x}'(\varvec{\beta }^*-{\tilde{\varvec{\beta }}})|^k|\mathbf {x}'({\tilde{\varvec{\beta }}}^*-\varvec{\beta }^*)|\right] \nonumber \\&\quad \le \frac{2}{k-1}(\frac{C}{2})^{1-k}\mathbb {E}\left[ (|\epsilon |^k+|\mathbf {x}'(\varvec{\beta }^*-{\tilde{\varvec{\beta }}})|^k)|\mathbf {x}'({\tilde{\varvec{\beta }}}^*-\varvec{\beta }^*)|\right] \nonumber \\&\quad =\frac{2}{k-1}(\frac{C}{2})^{1-k}\left\{ \mathbb {E}\left[ |\epsilon |^k|\mathbf {x}'({\tilde{\varvec{\beta }}}^*-\varvec{\beta }^*)|\right] \right. \nonumber \\&\qquad \left. +\;\mathbb {E}\left[ |\mathbf {x}'(\varvec{\beta }^*-{\tilde{\varvec{\beta }}})|^k|\mathbf {x}'({\tilde{\varvec{\beta }}}^*-\varvec{\beta }^*)|\right] \right\} \end{aligned}$$

(7.5)

As for the first term, by Condition 1 and 2,

$$\begin{aligned} \mathbb {E}\left[ |\epsilon |^k|\mathbf {x}'({\tilde{\varvec{\beta }}}^*-\varvec{\beta }^*)|\right]= & {} \mathbb {E}\left[ \mathbb {E}(|\epsilon |^k|\mathbf {x})|\mathbf {x}'({\tilde{\varvec{\beta }}}^*-\varvec{\beta }^*)|\right] \nonumber \\\le & {} \left[ \mathbb {E}(\mathbb {E}(|\epsilon |^k|\mathbf {x}))^2\right] ^{1/2}\left[ \mathbb {E}(|\mathbf {x}'({\tilde{\varvec{\beta }}}^*-\varvec{\beta }^*)|^2)\right] ^{1/2}\nonumber \\\le & {} M_k\sqrt{\kappa _u}\parallel {\tilde{\varvec{\beta }}}^*-\varvec{\beta }^*\parallel _2 \end{aligned}$$

(7.6)

As for the second term,

$$\begin{aligned} \mathbb {E}\left[ |\mathbf {x}'(\varvec{\beta }^*-{\tilde{\varvec{\beta }}})|^k|\mathbf {x}'({\tilde{\varvec{\beta }}}^*-\varvec{\beta }^*)|\right]\le & {} [\mathbb {E}(|\mathbf {x}'(\varvec{\beta }^*-{\tilde{\varvec{\beta }}})|^{2k})]^{1/2}[\mathbb {E}(|\mathbf {x}'({\tilde{\varvec{\beta }}}^*-\varvec{\beta }^*)|)^2]^{1/2}\nonumber \\\le & {} c\kappa _0^k\sqrt{\kappa _u}\parallel {\tilde{\varvec{\beta }}}^*-\varvec{\beta }^*\parallel _2 \end{aligned}$$

(7.7)

since $\mathbf {x}'(\varvec{\beta }^*-{\tilde{\varvec{\beta }}})$ is sub-Gaussian distributed by Condition 3, hence its 2kth moment is bounded by $c^2\kappa _0^{2k}$ for a universal positive constant c depending on k only.

Combined with these results, we have

$$\begin{aligned} \Vert \varvec{\beta }^*(C_u,C_l)-\varvec{\beta }^*\Vert _2\le \frac{2^k}{(k-1)}\frac{\max \{\alpha ,1-\alpha \}}{\min \{\alpha ,1-\alpha \}}\frac{\sqrt{\kappa _u}}{\kappa _l}(M_k+c\kappa _0^k)C^{1-k}. \end{aligned}$$

(7.8)

So far, the proof has been completed. $\square $

Lemma 2

Let $\xi _1,\xi _2,\ldots ,\xi _n$ be independent real valued random variables. Assume that there exists some positive numbers $\nu $ and c such that

$$\begin{aligned} \sum _{i=1}^n\mathbb {E}[\xi _i^2]\le \nu , \end{aligned}$$

and for all integers $k\ge 3$

$$\begin{aligned} \sum _{i=1}^n\mathbb {E}[(\xi _i)_+^k]\le \frac{k!}{2}\nu c^{k-2}. \end{aligned}$$

Let $S_n=\sum _{i=1}^n(\xi _i-\mathbb {E}[\xi _i])$, then for every positive x,

$$\begin{aligned} \mathbb {P}\left( S_n\ge \sqrt{2\nu x}+cx\right) \le \exp (-x). \end{aligned}$$

Proof

Details can be found in Proposition 2.9 of Massart (2007).

Lemma 3

Under Condition 1–3, there exist universal positive constants $\kappa _1',\kappa _2', c_1', c_2'$ such that with probability at least $1-c_1'\exp {(-c_2'n)}$,

$$\begin{aligned} \delta L_n(\varvec{\beta },\varDelta )\ge \kappa _1'\Vert \varDelta \Vert _2\left[ \Vert \varDelta \Vert _2-\kappa _2'\sqrt{\frac{\log p}{n}}\Vert \varDelta \Vert _1\right] \end{aligned}$$

(7.9)

for all $\Vert \varvec{\beta }\Vert _2\le 4\rho _2$, $\Vert \varDelta \Vert _2\le 8\rho _2$ where $\rho _2$ is a sufficiently large constant depending on R and $C=\max \{C_u,|C_l|\}\ge c_u\rho _2^{-1}$ where $c_u$ is a positive constant depending on $M_k,\kappa _l,\kappa _u$ and $\kappa _0$.

Proof

Define the set $ A=\left\{ (\varvec{\beta },\varDelta ):\Vert \varvec{\beta }\Vert _2\le 4\rho _2, ~\Vert \varDelta \Vert _2\le 8\rho _2\right\} $, then we can show that for any $(\varvec{\beta },\varDelta )\in A$,

$$\begin{aligned} \delta L_n(\varvec{\beta },\varDelta )\ge \frac{\min \{\alpha ,1-\alpha \}}{n}\sum _{i=1}^n \varphi _{\tau \Vert \varDelta \Vert _2}(\mathbf {x}_i'\varDelta \mathbb {I}(|y_i-\mathbf {x}_i'\varvec{\beta }|<T). \end{aligned}$$

(7.10)

for some proper chosen T and $\tau $ satisfying $T+8\tau \rho _2\le \min \{C_u,|C_l|\}$, where the threshold function

$$\begin{aligned} \varphi _t(u)=u^2\mathbb {I}(|u|\le t^2/2)+(t-|u|)^2\mathbb {I}(t/2\le |u|\le t^2). \end{aligned}$$

(7.11)

To show (7.10), if $|y_i-\mathbf {x}_i'\varvec{\beta }|>T$ or $|\mathbf {x}_i'\varDelta |>\tau \Vert \varDelta \Vert _2$, the right hand side of (7.10) is 0. So by convexity of the robust loss function (2.2), (7.10) holds trivially. If $|y_i-\mathbf {x}_i'\varvec{\beta }|\le T$ and $|\mathbf {x}_i'\varDelta |\le \tau \Vert \varDelta \Vert _2$, then by Lemma 1,

$$\begin{aligned} \delta L_n(\varvec{\beta },\varDelta )\ge & {} \frac{\min \{\alpha ,1-\alpha \}}{n}\sum _{i=1}^n (\mathbf {x}_i'\varDelta )^2\nonumber \\\ge & {} \frac{\min \{\alpha ,1-\alpha \}}{n}\sum _{i=1}^n \varphi _{\tau \Vert \varDelta \Vert _2}(\mathbf {x}_i'\varDelta \mathbb {I}(|y_i-\mathbf {x}_i'\varvec{\beta }|<T). \end{aligned}$$

(7.12)

Based on this inequality for $\delta L_n(\varvec{\beta },\varDelta )$, we follow the similar proof procedure of Lemma 2 in Fan et al. (2017) and obtain the wanted results.

Lemma 4

Suppose $L_n(\varvec{\beta })$ is convex and ${\tilde{\varvec{\beta }}}^*$, ${\tilde{\varvec{\beta }}}^*+\varDelta $ lies in the feasible set so that $\Vert \varDelta \Vert _1\le 2R$. If the Restricted Strong Convexity condition holds for $\Vert \varDelta \Vert _2\le 1$ and $n\ge 4R^2 \tau _1^2\log p$, then

$$\begin{aligned} \delta L_n({\tilde{\varvec{\beta }}}^*,\varDelta )\ge \kappa _1 \Vert \varDelta \Vert _2-\sqrt{\frac{\log p}{n}}\Vert \varDelta \Vert _1, ~~~~\forall \Vert \varDelta \Vert _2 \ge 1 \end{aligned}$$

(7.13)

Proof

Details can be found in Lemma 8 of Loh and Wainwright (2015).

Proof of Theorem 2

If we can prove the following two claims, then by the Theorem 1 of Loh and Wainwright (2015), this theorem holds:

Claim I: $\Vert \nabla L_n({\tilde{\varvec{\beta }}}^*)\Vert _{\infty }\le \lambda L/4$ with overwhelming probability;
Claim II: the empirical loss $L_n(\varvec{\beta })$ satisfies the Restricted Strong Convexity condition.

For Claim I, we use the Bernstein inequality (Lemma 2) and the union bound to establish this result. Through straight calculation,

$$\begin{aligned} \nabla L_n({\tilde{\varvec{\beta }}}^*)=-\frac{1}{n}\sum _{i=1}^n\psi '_{\alpha }(y_i-\mathbf {x}_i'{\tilde{\varvec{\beta }}}^*;C_u,C_l)\mathbf {x}_i, \end{aligned}$$

(7.14)

where $\psi '_{\alpha }(\cdot )$ is defined in Eq. 3.3.

Note that $\left| \psi '_{\alpha }(r;C_u,C_l)\right| \le 2\max \{\alpha ,1-\alpha \}|r|$ so that for $j=1,\ldots ,p$,

$$\begin{aligned} \left| \psi '_{\alpha }(y_i-\mathbf {x}_i'{\tilde{\varvec{\beta }}}^*;C_u,C_l)\mathbf {x}_{ij}\right| \le 2\max \{\alpha ,1-\alpha \}\left| y_i-\mathbf {x}_i'\varvec{\beta }\right| \left| \mathbf {x}_{ij}\right| . \end{aligned}$$

(7.15)

Then

$$\begin{aligned} \mathbb {E}[\left| \psi '_{\alpha }(y_i-\mathbf {x}_i'{\tilde{\varvec{\beta }}}^*;C_u,C_l)\mathbf {x}_{ij}\right| ^2]\le & {} 4 \mathbb {E}[\left| y_i-\mathbf {x}_i'{\tilde{\varvec{\beta }}}^*\right| ^2\left| \mathbf {x}_{ij}\right| ^2]\nonumber \\\le & {} 8\mathbb {E}\left[ \left( \epsilon _i^2+\left| \mathbf {x}_i'({\tilde{\varvec{\beta }}}^*-\varvec{\beta }^*\right| ^2\right) \left| \mathbf {x}_{ij}\right| ^2\right] \nonumber \\= & {} 8\mathbb {E}\left[ \mathbb {E}(\epsilon _i^2|\mathbf {x}_i)\mathbf {x}_{ij}^2+\left| \mathbf {x}_i'({\tilde{\varvec{\beta }}}^*-\varvec{\beta }^*\right| ^2\left| \mathbf {x}_{ij}\right| ^2\right] \nonumber \\\le & {} \nu . \end{aligned}$$

(7.16)

where the last inequality above follows a similar procedure as in the proof of Theorem 1 and $\nu $ is a constant depending on $\kappa _0$ and $M_2$.

Denote by $C=2\max \{\alpha ,1-\alpha \}\times \max \{C_u,|C_l|\}$. Let $A=\frac{|\psi '_{\alpha }(y_i-\mathbf {x}_i'{\tilde{\varvec{\beta }}}^*;C_u,C_l)|}{C}$, then $|A|\le 1$. By the Theorem 2.7 of Rivasplata (2012), $A\mathbf {x}_{ij}$ is also sub-gaussian with the same parameter $\kappa _0$. Then for any $k\ge 3$, by the Proposition 3.2 of Rivasplata (2012), we have

$$\begin{aligned} \mathbb {E}\left[ \left| \psi '_{\alpha }(y_i-\mathbf {x}_i'{\tilde{\varvec{\beta }}}^*;C_u,C_l)\mathbf {x}_{ij}\right| ^k\right] \le C^k \mathbb {E}[|A\mathbf {x}_{ij}|^k] \le \frac{k!}{2}(CB)^{k-2}\nu , \end{aligned}$$

(7.17)

where B is a constant depending on $\kappa _0$.

Meanwhile by the definition of ${\tilde{\varvec{\beta }}}^*$, $\mathbb {E}[\psi '_{\alpha }(y_i-\mathbf {x}_i'{\tilde{\varvec{\beta }}}^*;C_u,C_l)\mathbf {x}_{ij}]=0$ for $j=1,\ldots ,p$. Then by the Bernstein inequality from Lemma 2, we have for $j=1,\ldots ,p$

$$\begin{aligned} \mathbb {P}\left( \left| -\frac{1}{n}\sum _{i=1}^n\psi '_{\alpha }(y_i-\mathbf {x}_i'{\tilde{\varvec{\beta }}}^*;C_u,C_l)\mathbf {x}_{ij}\right| \ge \sqrt{2\frac{\nu }{n}t}+\frac{CB}{n}t\right) \le \exp (-t). \end{aligned}$$

(7.18)

Chose $t=\frac{n\lambda ^2L^2}{128\nu }$ and $C\le \frac{16\nu }{B\lambda L}$, we have $\frac{CBt}{n}\le \sqrt{\frac{2\nu t}{n}}$ and therefore,

$$\begin{aligned} \mathbb {P}\left( \left| -\frac{1}{n}\sum _{i=1}^n\psi '_{\alpha }(y_i-\mathbf {x}_i'{\tilde{\varvec{\beta }}}^*;C_u,C_l)\mathbf {x}_{ij}\right| \ge \frac{\lambda L}{4}\right) \le \exp \left( -\frac{n\lambda ^2L^2}{128\nu }\right) . \end{aligned}$$

(7.19)

Through the union bound argument, we have

$$\begin{aligned} \mathbb {P}\left( \left\| -\frac{1}{n}\sum _{i=1}^n\psi '_{\alpha }(y_i-\mathbf {x}_i'{\tilde{\varvec{\beta }}}^*;C_u,C_l)\mathbf {x}_{i}\right\| _{\infty }\ge \frac{\lambda L}{4}\right)\le & {} p\exp \left( -\frac{n\lambda ^2L^2}{128\nu }\right) \nonumber \\= & {} \exp \left( -\frac{n\lambda ^2L^2}{128\nu }+\log p\right) .\nonumber \\ \end{aligned}$$

(7.20)

Chose $\lambda =\kappa _{\lambda }\sqrt{\frac{\log p}{n}}$ and $\frac{\kappa ^2_{\lambda }L^2}{128\nu }-1>0$ such that $\exp \left( -\frac{n\lambda ^2L^2}{128\nu }+\log p\right) \le \exp \left( -cn\right) $ for some positive $c=\frac{\kappa ^2_{\lambda }L^2}{128\nu }-1>0$.

For Claim II, by Lemma 3, for $\Vert \varDelta \Vert _2\le 8\rho _2$, with probability at least $1-c_1'\exp {(-c_2'n)}$,

$$\begin{aligned} \delta L_n(\varvec{\beta },\varDelta )\ge \kappa _1'\Vert \varDelta \Vert _2^2-\kappa _1'\kappa _2\sqrt{\frac{\log p}{n}}\Vert \varDelta \Vert _1\Vert \varDelta \Vert _2. \end{aligned}$$

(7.21)

Using the fact that $ab\le (a^2+b^2)/2$, we obtain that

$$\begin{aligned} \delta L_n(\varvec{\beta },\varDelta )\ge & {} \kappa _1'\Vert \varDelta \Vert _2^2-\left( \frac{1}{2}\kappa _1'\Vert \varDelta \Vert _2^2+\frac{1}{2}\kappa _1'\kappa _2^2\frac{\log p}{n}\Vert \varDelta \Vert _1^2\right) \nonumber \\= & {} \kappa _1\Vert \varDelta \Vert _2^2-\tau _1\frac{\log p}{n}\Vert \varDelta \Vert _1^2. \end{aligned}$$

(7.22)

with $\kappa _1=\frac{1}{2}\kappa _1',~~\tau _1=\frac{1}{2}\kappa _1'\kappa _2^2$.

Without loss of generality, we assume $\rho _2\ge 1/8$. So we have proved that the first scenario of the Restricted Strong Convexity, i.e., the Restricted Strong Convexity condition holds for the empirical loss $L_n(\varvec{\beta })$ when $\Vert \varDelta \Vert _2\le 1$. By Lemma 4, when $n\ge 4R^2 \tau _1^2\log p$, the whole Restricted Strong Convexity condition holds.

Then based on the two claims above, by probability union bound argument, there exist positive constant $c_1,c_2$ such that with probability at least $1-c_1\exp \{-c_2n\}$, the statistical error bound holds.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhao, J., Yan, G. & Zhang, Y. Robust estimation and shrinkage in ultrahigh dimensional expectile regression with heavy tails and variance heterogeneity. Stat Papers 63, 1–28 (2022). https://doi.org/10.1007/s00362-021-01227-2

Download citation

Received: 01 October 2019
Revised: 14 February 2021
Accepted: 23 February 2021
Published: 11 March 2021
Issue Date: February 2022
DOI: https://doi.org/10.1007/s00362-021-01227-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust estimation and shrinkage in ultrahigh dimensional expectile regression with heavy tails and variance heterogeneity

Abstract

Access this article

Similar content being viewed by others

Local Walsh-average-based estimation and variable selection for single-index models

Walsh-average based variable selection for varying coefficient models

Shrinkage and Sparse Estimation for High-Dimensional Linear Models

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Lemma 1

Proof

Proof of Theorem 1

Lemma 2

Proof

Lemma 3

Proof

Lemma 4

Proof

Proof of Theorem 2

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Robust estimation and shrinkage in ultrahigh dimensional expectile regression with heavy tails and variance heterogeneity

Abstract

Access this article

Similar content being viewed by others

Local Walsh-average-based estimation and variable selection for single-index models

Walsh-average based variable selection for varying coefficient models

Shrinkage and Sparse Estimation for High-Dimensional Linear Models

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

Lemma 1

Proof

Proof of Theorem 1

Lemma 2

Proof

Lemma 3

Proof

Lemma 4

Proof

Proof of Theorem 2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation