Skip to main content
Log in

Robust estimation and shrinkage in ultrahigh dimensional expectile regression with heavy tails and variance heterogeneity

  • Regular Article
  • Published:
Statistical Papers Aims and scope Submit manuscript

Abstract

High-dimensional data subject to heavy-tailed phenomena and heterogeneity are commonly encountered in various scientific fields and bring new challenges to the classical statistical methods. In this paper, we combine the asymmetric square loss and huber-type robust technique to develop the robust expectile regression for ultrahigh dimensional heavy-tailed heterogeneous data. Different from the classical huber method, we introduce two different tuning parameters on both sides to account for possibly asymmetry and allow them to diverge to reduce bias induced by the robust approximation. In the regularized framework, we adopt the generally folded concave penalty function like the SCAD or MCP penalty for the seek of bias reduction. We investigate the finite sample property of the corresponding estimator and figure out how our method plays its role to trade off the estimation accuracy against the heavy-tailed distribution. Also, based on our theoretical study, we propose an efficient first-order optimization algorithm after locally linear approximation of the non-convex problem. Simulation studies under various distributions and a real data example demonstrate the satisfactory performances of our method in coefficient estimation, model selection and heterogeneity detection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Agarwal A, Negahban S, Wainwright MJ (2012) Fast global convergence of gradient methods for high-dimensional statistical recovery. The Annals of Statistics 40(5):2452–2482

    Article  MathSciNet  Google Scholar 

  • Aigner DJ, Amemiya T, Poirier DJ (1976) On the estimation of production frontiers: maximum likelihood estimation of the parameters of a discontinuous density function. International Economic Review 17(2):377–396

    Article  MathSciNet  Google Scholar 

  • Aitkin M (1987) Modelling variance heterogeneity in normal regression using GLIM. Journal of the Royal Statistical Society: Series C (Applied Statistics) 36(3):332–339

    Google Scholar 

  • Bickel PJ (1984) Robust regression based on infinitesimal neighbourhoods. The Annals of Statistics 12:1349–1368

    MathSciNet  MATH  Google Scholar 

  • De Rossi G, Harvey A (2009) Quantiles, expectiles and splines. Journal of Econometrics 152(2):179–185

    Article  MathSciNet  Google Scholar 

  • Efron B (1991) Regression percentiles using asymmetric squared error loss. Statistica Sinica 93-125

  • Fan J, Li Q, Wang Y (2017) Estimation of high dimensional mean regression in the absence of symmetry and light tail assumptions. Journal of the Royal Statistical Society 79(1):247–265

    Article  MathSciNet  Google Scholar 

  • Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American statistical Association 96(456):1348–1360

    Article  MathSciNet  Google Scholar 

  • Fan J, Xue L, Zou H (2014) Strong oracle optimality of folded concave penalized estimation. The Annals of statistics 42(3):819

    MathSciNet  MATH  Google Scholar 

  • Fu A, Narasimhan B, Boyd S (2017) CVXR: An R package for disciplined convex optimization. https://web.stanford.edu/~boyd/papers/cvxr_paper.html

  • Gu Y, Zou H (2016) High-dimensional generalizations of asymmetric least squares regression and their applications. The Annals of Statistics 44(6):2661–2694

    Article  MathSciNet  Google Scholar 

  • Guo C, Yang H, Lv J (2017) Robust variable selection in high-dimensional varying coefficient models based on weighted composite quantile regression. Statistical Papers 58(4):1009–1033

    Article  MathSciNet  Google Scholar 

  • Grant MC, Boyd SP (2013) CVX: Matlab software for disciplined convex programming, version 2.0 beta. http://cvxr.com/cvx

  • Grant MC, Boyd SP (2008) Graph implementations for nonsmooth convex programs. InRecent advances in learning and control 95-110

  • Huang CC, Liu K, Pope RM, Du P, Lin S, Rajamannan NM et al (2011) Activated tlr signaling in atherosclerosis among women with lower framingham risk score: the multi-ethnic study of atherosclerosis. Plos One 6(6):e21067

  • Huber PJ (1964) Robust Estimation of a Location Parameter. The Annals of Mathematical Statistics 35:73–101

    Article  MathSciNet  Google Scholar 

  • Huber PJ (1983) Minimax aspects of bounded-influence regression. Journal of the American Statistical Association 78(381):66–72

    Article  MathSciNet  Google Scholar 

  • Kim M, Lee S (2016) Nonlinear expectile regression with application to value-at-risk and expected shortfall estimation. Computational Statistics & Data Analysis 94:1–19

    Article  MathSciNet  Google Scholar 

  • Liu Y, Zeng P, Lin L (2020) Degrees of freedom for regularized regression with Huber loss and linear constraints. Statistical Papers 1-23

  • Loh PL, Wainwright MJ (2015) Regularized M-estimators with Nonconvexity: Statistical and Algorithmic Theory for Local Optima. Journal of Machine Learning Research 16:559–616

    MathSciNet  MATH  Google Scholar 

  • Maronna RA, Martin RD, Yohai VJ (2019) Robust statistics: theory and methods (with R). John Wiley & Sons,

  • Massart P (2007) Concentration inequalities and model selection. Springer, Berlin

    MATH  Google Scholar 

  • Newey WK, Powell JL (1987) Asymmetric least squares estimation and testing. Econometrica: Journal of the Econometric Society 819-847

  • Parikh N, Boyd S (2014) Proximal algorithms. Foundations and Trends in Optimization 1(3):127–239

    Article  Google Scholar 

  • Rigby RA, Stasinopoulos DM (1996) A semi-parametric additive model for variance heterogeneity. Statistics and Computing 6(1):57–65

    Article  Google Scholar 

  • Rivasplata O (2012) Subgaussian random variables: An expository note. Internet publication, PDF

    Google Scholar 

  • Smucler E, Yohai VJ (2017) Robust and sparse estimators for linear regression models. Computational Statistics & Data Analysis 111:116–130

    Article  MathSciNet  Google Scholar 

  • Sobotka F, Kneib T (2012) Geoadditive expectile regression. Computational Statistics & Data Analysis 56(4):755–767

    Article  MathSciNet  Google Scholar 

  • Tibshirani R (1996) Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological) 58(1): 267-288

  • Waltrup LS, Sobotka F, Kneib T, Kauermann G (2015) Expectile and quantile regression-David and Goliath? Statistical Modelling 15(5):433–456

    Article  MathSciNet  Google Scholar 

  • Wang L, Wu Y, Li R (2012) Quantile regression for analyzing heterogeneity in ultra-high dimension. Journal of the American Statistical Association 107(497):214–222

    Article  MathSciNet  Google Scholar 

  • Wang L, Zheng C, Zhou W, Zhou W (2020) A New Principle for Tuning-Free Huber Regression. Statistica Sinica

  • Yao Q, Tong H (1996) Asymmetric least squares regression estimation: a nonparametric approach. Journal of nonparametric statistics 6(2–3):273–292

    Article  MathSciNet  Google Scholar 

  • Zhang CH (2010) Nearly unbiased variable selection under minimax concave penalty. The Annals of statistics 38(2):894–942

    Article  MathSciNet  Google Scholar 

  • Zhao J, Chen Y, Zhang Y (2018) Expectile regression for analyzing heteroscedasticity in high dimension. Statistics & Probability Letters 137: 304-311

  • Zou H, Li R (2008) One-step sparse estimates in nonconcave penalized likelihood models. The Annals of statistics 36(4):1509–1533

    MathSciNet  MATH  Google Scholar 

  • Ziegel JF (2016) Coherence and elicitability. Mathematical Finance 26(4):901–918

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

The authors thank the Editor and anonymous referees for their valuable comments and suggestions. This research is partly supported by National Statistical Science Research Project (No. 2018LY30), Zhejiang Provincial Natural Science Foundation (No: LY18A010005) and the Research Project of Humanities and Social Science of Ministry of Education of China (No. 17YJA910003).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yi Zhang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Lemma 1

The loss function \(\phi (\cdot )\) defined in (2.1) is continuous differentiable. Moreover, for any \(r,r_0\in \mathbb {R}\), we have

$$\begin{aligned} \min \{\alpha ,1-\alpha \}\cdot (r-r_0)^2\le \phi _{\alpha }(r)-\phi _{\alpha }(r_0)-\phi '_{\alpha }(r_0)\cdot (r-r_0)\le \max \{\alpha ,1-\alpha \}\cdot (r-r_0)^2. \end{aligned}$$
(7.1)

Proof

Details can be found in Gu and Zou (2016).

Proof of Theorem 1

For simplicity and convenience in notation, we omit the notation dependence with the pre-determined parameters \(C_u,~C_l\) and denote by \(\tilde{\varvec{\beta }^*}:=\varvec{\beta }^*(C_u,C_l)\) for short.

By Eq. (3.1) and Lemma 1,

$$\begin{aligned} \mathbb {E}[\phi _{\alpha }(y-\mathbf {x}'{\tilde{\varvec{\beta }}}^*)-\phi _{\alpha }(y-\mathbf {x}'\varvec{\beta }^*)]\ge & {} \min \{\alpha ,1-\alpha \}\mathbb {E}[\mathbf {x}'{\tilde{\varvec{\beta }}}^*-\mathbf {x}'\varvec{\beta }^*]^2\nonumber \\= & {} \min \{\alpha ,1-\alpha \}({\tilde{\varvec{\beta }}}^*-\varvec{\beta }^*)'\mathbb {E}(\mathbf {x}\mathbf {x}')({\tilde{\varvec{\beta }}}^*-\varvec{\beta }^*)\nonumber \\\ge & {} \min \{\alpha ,1-\alpha \}\kappa _l\parallel {\tilde{\varvec{\beta }}}^*-\varvec{\beta }^*\parallel _2^2 \end{aligned}$$
(7.2)

where the last inequality follows by Condition 2.

On the other hand, \({\tilde{\varvec{\beta }}}^*=\underset{\varvec{\beta } \in \mathbb {R}^{p}}{\arg \min }~\mathbb {E}\psi _{\alpha }(y-\mathbf {x}'\varvec{\beta };C_u,C_l)\). Then

$$\begin{aligned}&\mathbb {E}[\phi _{\alpha }(y-\mathbf {x}'{\tilde{\varvec{\beta }}}^*)-\phi _{\alpha }(y-\mathbf {x}'\varvec{\beta }^*)]\\&\quad =\mathbb {E}[\phi _{\alpha }(y-\mathbf {x}'{\tilde{\varvec{\beta }}}^*)-\psi _{\alpha }(y-\mathbf {x}'{\tilde{\varvec{\beta }}}^*)]\\&\qquad +\;\mathbb {E}[\psi _{\alpha }(y-\mathbf {x}'{\tilde{\varvec{\beta }}}^*)-\psi _{\alpha }(y-\mathbf {x}'\varvec{\beta }^*)]\\&\qquad -\;\mathbb {E}[\phi _{\alpha }(y-\mathbf {x}'\varvec{\beta }^*)-\psi _{\alpha }(y-\mathbf {x}'\varvec{\beta }^*)]\\&\quad \le \mathbb {E}[g_{\alpha }(y-\mathbf {x}'{\tilde{\varvec{\beta }}}^*)-g_{\alpha }(y-\mathbf {x}'\varvec{\beta }^*)], \end{aligned}$$

where \(g_{\alpha }(\cdot )\) is defined as follows

$$\begin{aligned} g_{\alpha }(r)=\phi _{\alpha }(r)-\psi _{\alpha }(r)=\alpha (r-C_u)^2\mathbb {I}(r\ge C_u)+(1-\alpha )(r-C_l)^2\mathbb {I}(r\le C_l). \end{aligned}$$
(7.3)

Note \(g_{\alpha }(r)\) is continuous and differentiable and

$$\begin{aligned} g'_{\alpha }(r)=2\alpha (r-C_u)\mathbb {I}(r\ge C_u)+2(1-\alpha )(r-C_l)\mathbb {I}(r\le C_l). \end{aligned}$$

So by the mean value theorem, there exists some \({\tilde{\varvec{\beta }}}\) on the line segment between \({\tilde{\varvec{\beta }}}^*\) and \(\varvec{\beta }^*\) such that

$$\begin{aligned}&\left| \mathbb {E}[g_{\alpha }(y-\mathbf {x}'{\tilde{\varvec{\beta }}}^*)-g_{\alpha }(y-\mathbf {x}'\varvec{\beta }^*)]\right| \\&\quad = \mathbb {E}\left[ |g_{\alpha }'(y-\mathbf {x}{\tilde{\varvec{\beta }}})|\times |\mathbf {x}'({\tilde{\varvec{\beta }}}^*-\varvec{\beta }^*)|\right] \\&\quad \le 2\max \{\alpha ,1-\alpha \}\mathbb {E}\left[ ({\tilde{r}}-C)\mathbb {I}({\tilde{r}}\ge C)\times |\mathbf {x}'({\tilde{\varvec{\beta }}}^*-\varvec{\beta }^*)|\right] \end{aligned}$$

where \({\tilde{r}}=|y-\mathbf {x}'{\tilde{\varvec{\beta }}}|\) and \(C=\min \{C_u,|C_l|\}\).

Denote by \(P_{\epsilon }\) the conditional distribution of \(\epsilon \) on \(\mathbf {x}\) and \(\mathbb {E}_{\epsilon }\) the corresponding conditional expectation, we have

$$\begin{aligned} \mathbb {E}_{\epsilon }({\tilde{r}}-C)\mathbb {I}({\tilde{r}}\ge C)= & {} \int _{0}^{\infty }P_{\epsilon }({\tilde{r}}\mathbb {I}({\tilde{r}}\ge C)>t)dt-CP_{\epsilon }({\tilde{r}}\ge C)\nonumber \\= & {} \int _{0}^{\infty }P_{\epsilon }({\tilde{r}}>t,~{\tilde{r}}>C)dt-CP_{\epsilon }({\tilde{r}}\ge C)\nonumber \\= & {} \int _{C}^{\infty }P_{\epsilon }({\tilde{r}}>t)dt+\int _{0}^{C}P_{\epsilon }({\tilde{r}}>C)dt-CP_{\epsilon }({\tilde{r}}\ge C)\nonumber \\\le & {} \int _{C}^{\infty }\frac{\mathbb {E}_{\epsilon }[{\tilde{r}}]^k}{t^k}dt=\frac{1}{k-1}C^{1-k}\mathbb {E}_{\epsilon }[{\tilde{r}}]^k \end{aligned}$$
(7.4)

where the second to last inequality is obtained by Markov inequality.

Therefore, \(\mathbb {E}[\phi _{\alpha }(y-\mathbf {x}'{\tilde{\varvec{\beta }}}^*)-\phi _{\alpha }(y-\mathbf {x}'\varvec{\beta }^*)]\) is further bounded by

$$\begin{aligned}&\frac{2}{k-1}C^{1-k}\mathbb {E}\left[ |y-\mathbf {x}'{\tilde{\varvec{\beta }}}|^k|\mathbf {x}'({\tilde{\varvec{\beta }}}^*-\varvec{\beta }^*)|\right] \nonumber \\&\quad =\frac{2}{k-1}C^{1-k}\mathbb {E}\left[ |\epsilon +\mathbf {x}'(\varvec{\beta }^*-{\tilde{\varvec{\beta }}})|^k|\mathbf {x}'({\tilde{\varvec{\beta }}}^*-\varvec{\beta }^*)|\right] \nonumber \\&\quad \le \frac{2}{k-1}(\frac{C}{2})^{1-k}\mathbb {E}\left[ (|\epsilon |^k+|\mathbf {x}'(\varvec{\beta }^*-{\tilde{\varvec{\beta }}})|^k)|\mathbf {x}'({\tilde{\varvec{\beta }}}^*-\varvec{\beta }^*)|\right] \nonumber \\&\quad =\frac{2}{k-1}(\frac{C}{2})^{1-k}\left\{ \mathbb {E}\left[ |\epsilon |^k|\mathbf {x}'({\tilde{\varvec{\beta }}}^*-\varvec{\beta }^*)|\right] \right. \nonumber \\&\qquad \left. +\;\mathbb {E}\left[ |\mathbf {x}'(\varvec{\beta }^*-{\tilde{\varvec{\beta }}})|^k|\mathbf {x}'({\tilde{\varvec{\beta }}}^*-\varvec{\beta }^*)|\right] \right\} \end{aligned}$$
(7.5)

As for the first term, by Condition 1 and 2,

$$\begin{aligned} \mathbb {E}\left[ |\epsilon |^k|\mathbf {x}'({\tilde{\varvec{\beta }}}^*-\varvec{\beta }^*)|\right]= & {} \mathbb {E}\left[ \mathbb {E}(|\epsilon |^k|\mathbf {x})|\mathbf {x}'({\tilde{\varvec{\beta }}}^*-\varvec{\beta }^*)|\right] \nonumber \\\le & {} \left[ \mathbb {E}(\mathbb {E}(|\epsilon |^k|\mathbf {x}))^2\right] ^{1/2}\left[ \mathbb {E}(|\mathbf {x}'({\tilde{\varvec{\beta }}}^*-\varvec{\beta }^*)|^2)\right] ^{1/2}\nonumber \\\le & {} M_k\sqrt{\kappa _u}\parallel {\tilde{\varvec{\beta }}}^*-\varvec{\beta }^*\parallel _2 \end{aligned}$$
(7.6)

As for the second term,

$$\begin{aligned} \mathbb {E}\left[ |\mathbf {x}'(\varvec{\beta }^*-{\tilde{\varvec{\beta }}})|^k|\mathbf {x}'({\tilde{\varvec{\beta }}}^*-\varvec{\beta }^*)|\right]\le & {} [\mathbb {E}(|\mathbf {x}'(\varvec{\beta }^*-{\tilde{\varvec{\beta }}})|^{2k})]^{1/2}[\mathbb {E}(|\mathbf {x}'({\tilde{\varvec{\beta }}}^*-\varvec{\beta }^*)|)^2]^{1/2}\nonumber \\\le & {} c\kappa _0^k\sqrt{\kappa _u}\parallel {\tilde{\varvec{\beta }}}^*-\varvec{\beta }^*\parallel _2 \end{aligned}$$
(7.7)

since \(\mathbf {x}'(\varvec{\beta }^*-{\tilde{\varvec{\beta }}})\) is sub-Gaussian distributed by Condition 3, hence its 2kth moment is bounded by \(c^2\kappa _0^{2k}\) for a universal positive constant c depending on k only.

Combined with these results, we have

$$\begin{aligned} \Vert \varvec{\beta }^*(C_u,C_l)-\varvec{\beta }^*\Vert _2\le \frac{2^k}{(k-1)}\frac{\max \{\alpha ,1-\alpha \}}{\min \{\alpha ,1-\alpha \}}\frac{\sqrt{\kappa _u}}{\kappa _l}(M_k+c\kappa _0^k)C^{1-k}. \end{aligned}$$
(7.8)

So far, the proof has been completed. \(\square \)

Lemma 2

Let \(\xi _1,\xi _2,\ldots ,\xi _n\) be independent real valued random variables. Assume that there exists some positive numbers \(\nu \) and c such that

$$\begin{aligned} \sum _{i=1}^n\mathbb {E}[\xi _i^2]\le \nu , \end{aligned}$$

and for all integers \(k\ge 3\)

$$\begin{aligned} \sum _{i=1}^n\mathbb {E}[(\xi _i)_+^k]\le \frac{k!}{2}\nu c^{k-2}. \end{aligned}$$

Let \(S_n=\sum _{i=1}^n(\xi _i-\mathbb {E}[\xi _i])\), then for every positive x,

$$\begin{aligned} \mathbb {P}\left( S_n\ge \sqrt{2\nu x}+cx\right) \le \exp (-x). \end{aligned}$$

Proof

Details can be found in Proposition 2.9 of Massart (2007).

Lemma 3

Under Condition 13, there exist universal positive constants \(\kappa _1',\kappa _2', c_1', c_2'\) such that with probability at least \(1-c_1'\exp {(-c_2'n)}\),

$$\begin{aligned} \delta L_n(\varvec{\beta },\varDelta )\ge \kappa _1'\Vert \varDelta \Vert _2\left[ \Vert \varDelta \Vert _2-\kappa _2'\sqrt{\frac{\log p}{n}}\Vert \varDelta \Vert _1\right] \end{aligned}$$
(7.9)

for all \(\Vert \varvec{\beta }\Vert _2\le 4\rho _2\), \(\Vert \varDelta \Vert _2\le 8\rho _2\) where \(\rho _2\) is a sufficiently large constant depending on R and \(C=\max \{C_u,|C_l|\}\ge c_u\rho _2^{-1}\) where \(c_u\) is a positive constant depending on \(M_k,\kappa _l,\kappa _u\) and \(\kappa _0\).

Proof

Define the set \( A=\left\{ (\varvec{\beta },\varDelta ):\Vert \varvec{\beta }\Vert _2\le 4\rho _2, ~\Vert \varDelta \Vert _2\le 8\rho _2\right\} \), then we can show that for any \((\varvec{\beta },\varDelta )\in A\),

$$\begin{aligned} \delta L_n(\varvec{\beta },\varDelta )\ge \frac{\min \{\alpha ,1-\alpha \}}{n}\sum _{i=1}^n \varphi _{\tau \Vert \varDelta \Vert _2}(\mathbf {x}_i'\varDelta \mathbb {I}(|y_i-\mathbf {x}_i'\varvec{\beta }|<T). \end{aligned}$$
(7.10)

for some proper chosen T and \(\tau \) satisfying \(T+8\tau \rho _2\le \min \{C_u,|C_l|\}\), where the threshold function

$$\begin{aligned} \varphi _t(u)=u^2\mathbb {I}(|u|\le t^2/2)+(t-|u|)^2\mathbb {I}(t/2\le |u|\le t^2). \end{aligned}$$
(7.11)

To show (7.10), if \(|y_i-\mathbf {x}_i'\varvec{\beta }|>T\) or \(|\mathbf {x}_i'\varDelta |>\tau \Vert \varDelta \Vert _2\), the right hand side of (7.10) is 0. So by convexity of the robust loss function (2.2), (7.10) holds trivially. If \(|y_i-\mathbf {x}_i'\varvec{\beta }|\le T\) and \(|\mathbf {x}_i'\varDelta |\le \tau \Vert \varDelta \Vert _2\), then by Lemma 1,

$$\begin{aligned} \delta L_n(\varvec{\beta },\varDelta )\ge & {} \frac{\min \{\alpha ,1-\alpha \}}{n}\sum _{i=1}^n (\mathbf {x}_i'\varDelta )^2\nonumber \\\ge & {} \frac{\min \{\alpha ,1-\alpha \}}{n}\sum _{i=1}^n \varphi _{\tau \Vert \varDelta \Vert _2}(\mathbf {x}_i'\varDelta \mathbb {I}(|y_i-\mathbf {x}_i'\varvec{\beta }|<T). \end{aligned}$$
(7.12)

Based on this inequality for \(\delta L_n(\varvec{\beta },\varDelta )\), we follow the similar proof procedure of Lemma 2 in Fan et al. (2017) and obtain the wanted results.

Lemma 4

Suppose \(L_n(\varvec{\beta })\) is convex and \({\tilde{\varvec{\beta }}}^*\)\({\tilde{\varvec{\beta }}}^*+\varDelta \) lies in the feasible set so that \(\Vert \varDelta \Vert _1\le 2R\). If the Restricted Strong Convexity condition holds for \(\Vert \varDelta \Vert _2\le 1\) and \(n\ge 4R^2 \tau _1^2\log p\), then

$$\begin{aligned} \delta L_n({\tilde{\varvec{\beta }}}^*,\varDelta )\ge \kappa _1 \Vert \varDelta \Vert _2-\sqrt{\frac{\log p}{n}}\Vert \varDelta \Vert _1, ~~~~\forall \Vert \varDelta \Vert _2 \ge 1 \end{aligned}$$
(7.13)

Proof

Details can be found in Lemma 8 of Loh and Wainwright (2015).

Proof of Theorem 2

If we can prove the following two claims, then by the Theorem 1 of Loh and Wainwright (2015), this theorem holds:

  • Claim I: \(\Vert \nabla L_n({\tilde{\varvec{\beta }}}^*)\Vert _{\infty }\le \lambda L/4\) with overwhelming probability;

  • Claim II: the empirical loss \(L_n(\varvec{\beta })\) satisfies the Restricted Strong Convexity condition.

For Claim I, we use the Bernstein inequality (Lemma 2) and the union bound to establish this result. Through straight calculation,

$$\begin{aligned} \nabla L_n({\tilde{\varvec{\beta }}}^*)=-\frac{1}{n}\sum _{i=1}^n\psi '_{\alpha }(y_i-\mathbf {x}_i'{\tilde{\varvec{\beta }}}^*;C_u,C_l)\mathbf {x}_i, \end{aligned}$$
(7.14)

where \(\psi '_{\alpha }(\cdot )\) is defined in Eq. 3.3.

Note that \(\left| \psi '_{\alpha }(r;C_u,C_l)\right| \le 2\max \{\alpha ,1-\alpha \}|r|\) so that for \(j=1,\ldots ,p\),

$$\begin{aligned} \left| \psi '_{\alpha }(y_i-\mathbf {x}_i'{\tilde{\varvec{\beta }}}^*;C_u,C_l)\mathbf {x}_{ij}\right| \le 2\max \{\alpha ,1-\alpha \}\left| y_i-\mathbf {x}_i'\varvec{\beta }\right| \left| \mathbf {x}_{ij}\right| . \end{aligned}$$
(7.15)

Then

$$\begin{aligned} \mathbb {E}[\left| \psi '_{\alpha }(y_i-\mathbf {x}_i'{\tilde{\varvec{\beta }}}^*;C_u,C_l)\mathbf {x}_{ij}\right| ^2]\le & {} 4 \mathbb {E}[\left| y_i-\mathbf {x}_i'{\tilde{\varvec{\beta }}}^*\right| ^2\left| \mathbf {x}_{ij}\right| ^2]\nonumber \\\le & {} 8\mathbb {E}\left[ \left( \epsilon _i^2+\left| \mathbf {x}_i'({\tilde{\varvec{\beta }}}^*-\varvec{\beta }^*\right| ^2\right) \left| \mathbf {x}_{ij}\right| ^2\right] \nonumber \\= & {} 8\mathbb {E}\left[ \mathbb {E}(\epsilon _i^2|\mathbf {x}_i)\mathbf {x}_{ij}^2+\left| \mathbf {x}_i'({\tilde{\varvec{\beta }}}^*-\varvec{\beta }^*\right| ^2\left| \mathbf {x}_{ij}\right| ^2\right] \nonumber \\\le & {} \nu . \end{aligned}$$
(7.16)

where the last inequality above follows a similar procedure as in the proof of Theorem 1 and \(\nu \) is a constant depending on \(\kappa _0\) and \(M_2\).

Denote by \(C=2\max \{\alpha ,1-\alpha \}\times \max \{C_u,|C_l|\}\). Let \(A=\frac{|\psi '_{\alpha }(y_i-\mathbf {x}_i'{\tilde{\varvec{\beta }}}^*;C_u,C_l)|}{C}\), then \(|A|\le 1\). By the Theorem 2.7 of Rivasplata (2012), \(A\mathbf {x}_{ij}\) is also sub-gaussian with the same parameter \(\kappa _0\). Then for any \(k\ge 3\), by the Proposition 3.2 of Rivasplata (2012), we have

$$\begin{aligned} \mathbb {E}\left[ \left| \psi '_{\alpha }(y_i-\mathbf {x}_i'{\tilde{\varvec{\beta }}}^*;C_u,C_l)\mathbf {x}_{ij}\right| ^k\right] \le C^k \mathbb {E}[|A\mathbf {x}_{ij}|^k] \le \frac{k!}{2}(CB)^{k-2}\nu , \end{aligned}$$
(7.17)

where B is a constant depending on \(\kappa _0\).

Meanwhile by the definition of \({\tilde{\varvec{\beta }}}^*\), \(\mathbb {E}[\psi '_{\alpha }(y_i-\mathbf {x}_i'{\tilde{\varvec{\beta }}}^*;C_u,C_l)\mathbf {x}_{ij}]=0\) for \(j=1,\ldots ,p\). Then by the Bernstein inequality from Lemma 2, we have for \(j=1,\ldots ,p\)

$$\begin{aligned} \mathbb {P}\left( \left| -\frac{1}{n}\sum _{i=1}^n\psi '_{\alpha }(y_i-\mathbf {x}_i'{\tilde{\varvec{\beta }}}^*;C_u,C_l)\mathbf {x}_{ij}\right| \ge \sqrt{2\frac{\nu }{n}t}+\frac{CB}{n}t\right) \le \exp (-t). \end{aligned}$$
(7.18)

Chose \(t=\frac{n\lambda ^2L^2}{128\nu }\) and \(C\le \frac{16\nu }{B\lambda L}\), we have \(\frac{CBt}{n}\le \sqrt{\frac{2\nu t}{n}}\) and therefore,

$$\begin{aligned} \mathbb {P}\left( \left| -\frac{1}{n}\sum _{i=1}^n\psi '_{\alpha }(y_i-\mathbf {x}_i'{\tilde{\varvec{\beta }}}^*;C_u,C_l)\mathbf {x}_{ij}\right| \ge \frac{\lambda L}{4}\right) \le \exp \left( -\frac{n\lambda ^2L^2}{128\nu }\right) . \end{aligned}$$
(7.19)

Through the union bound argument, we have

$$\begin{aligned} \mathbb {P}\left( \left\| -\frac{1}{n}\sum _{i=1}^n\psi '_{\alpha }(y_i-\mathbf {x}_i'{\tilde{\varvec{\beta }}}^*;C_u,C_l)\mathbf {x}_{i}\right\| _{\infty }\ge \frac{\lambda L}{4}\right)\le & {} p\exp \left( -\frac{n\lambda ^2L^2}{128\nu }\right) \nonumber \\= & {} \exp \left( -\frac{n\lambda ^2L^2}{128\nu }+\log p\right) .\nonumber \\ \end{aligned}$$
(7.20)

Chose \(\lambda =\kappa _{\lambda }\sqrt{\frac{\log p}{n}}\) and \(\frac{\kappa ^2_{\lambda }L^2}{128\nu }-1>0\) such that \(\exp \left( -\frac{n\lambda ^2L^2}{128\nu }+\log p\right) \le \exp \left( -cn\right) \) for some positive \(c=\frac{\kappa ^2_{\lambda }L^2}{128\nu }-1>0\).

For Claim II, by Lemma 3, for \(\Vert \varDelta \Vert _2\le 8\rho _2\), with probability at least \(1-c_1'\exp {(-c_2'n)}\),

$$\begin{aligned} \delta L_n(\varvec{\beta },\varDelta )\ge \kappa _1'\Vert \varDelta \Vert _2^2-\kappa _1'\kappa _2\sqrt{\frac{\log p}{n}}\Vert \varDelta \Vert _1\Vert \varDelta \Vert _2. \end{aligned}$$
(7.21)

Using the fact that \(ab\le (a^2+b^2)/2\), we obtain that

$$\begin{aligned} \delta L_n(\varvec{\beta },\varDelta )\ge & {} \kappa _1'\Vert \varDelta \Vert _2^2-\left( \frac{1}{2}\kappa _1'\Vert \varDelta \Vert _2^2+\frac{1}{2}\kappa _1'\kappa _2^2\frac{\log p}{n}\Vert \varDelta \Vert _1^2\right) \nonumber \\= & {} \kappa _1\Vert \varDelta \Vert _2^2-\tau _1\frac{\log p}{n}\Vert \varDelta \Vert _1^2. \end{aligned}$$
(7.22)

with \(\kappa _1=\frac{1}{2}\kappa _1',~~\tau _1=\frac{1}{2}\kappa _1'\kappa _2^2\).

Without loss of generality, we assume \(\rho _2\ge 1/8\). So we have proved that the first scenario of the Restricted Strong Convexity, i.e., the Restricted Strong Convexity condition holds for the empirical loss \(L_n(\varvec{\beta })\) when \(\Vert \varDelta \Vert _2\le 1\). By Lemma 4, when \(n\ge 4R^2 \tau _1^2\log p\), the whole Restricted Strong Convexity condition holds.

Then based on the two claims above, by probability union bound argument, there exist positive constant \(c_1,c_2\) such that with probability at least \(1-c_1\exp \{-c_2n\}\), the statistical error bound holds.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhao, J., Yan, G. & Zhang, Y. Robust estimation and shrinkage in ultrahigh dimensional expectile regression with heavy tails and variance heterogeneity. Stat Papers 63, 1–28 (2022). https://doi.org/10.1007/s00362-021-01227-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00362-021-01227-2

Keywords

Navigation