Abstract
High-dimensional data subject to heavy-tailed phenomena and heterogeneity are commonly encountered in various scientific fields and bring new challenges to the classical statistical methods. In this paper, we combine the asymmetric square loss and huber-type robust technique to develop the robust expectile regression for ultrahigh dimensional heavy-tailed heterogeneous data. Different from the classical huber method, we introduce two different tuning parameters on both sides to account for possibly asymmetry and allow them to diverge to reduce bias induced by the robust approximation. In the regularized framework, we adopt the generally folded concave penalty function like the SCAD or MCP penalty for the seek of bias reduction. We investigate the finite sample property of the corresponding estimator and figure out how our method plays its role to trade off the estimation accuracy against the heavy-tailed distribution. Also, based on our theoretical study, we propose an efficient first-order optimization algorithm after locally linear approximation of the non-convex problem. Simulation studies under various distributions and a real data example demonstrate the satisfactory performances of our method in coefficient estimation, model selection and heterogeneity detection.
Similar content being viewed by others
References
Agarwal A, Negahban S, Wainwright MJ (2012) Fast global convergence of gradient methods for high-dimensional statistical recovery. The Annals of Statistics 40(5):2452–2482
Aigner DJ, Amemiya T, Poirier DJ (1976) On the estimation of production frontiers: maximum likelihood estimation of the parameters of a discontinuous density function. International Economic Review 17(2):377–396
Aitkin M (1987) Modelling variance heterogeneity in normal regression using GLIM. Journal of the Royal Statistical Society: Series C (Applied Statistics) 36(3):332–339
Bickel PJ (1984) Robust regression based on infinitesimal neighbourhoods. The Annals of Statistics 12:1349–1368
De Rossi G, Harvey A (2009) Quantiles, expectiles and splines. Journal of Econometrics 152(2):179–185
Efron B (1991) Regression percentiles using asymmetric squared error loss. Statistica Sinica 93-125
Fan J, Li Q, Wang Y (2017) Estimation of high dimensional mean regression in the absence of symmetry and light tail assumptions. Journal of the Royal Statistical Society 79(1):247–265
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American statistical Association 96(456):1348–1360
Fan J, Xue L, Zou H (2014) Strong oracle optimality of folded concave penalized estimation. The Annals of statistics 42(3):819
Fu A, Narasimhan B, Boyd S (2017) CVXR: An R package for disciplined convex optimization. https://web.stanford.edu/~boyd/papers/cvxr_paper.html
Gu Y, Zou H (2016) High-dimensional generalizations of asymmetric least squares regression and their applications. The Annals of Statistics 44(6):2661–2694
Guo C, Yang H, Lv J (2017) Robust variable selection in high-dimensional varying coefficient models based on weighted composite quantile regression. Statistical Papers 58(4):1009–1033
Grant MC, Boyd SP (2013) CVX: Matlab software for disciplined convex programming, version 2.0 beta. http://cvxr.com/cvx
Grant MC, Boyd SP (2008) Graph implementations for nonsmooth convex programs. InRecent advances in learning and control 95-110
Huang CC, Liu K, Pope RM, Du P, Lin S, Rajamannan NM et al (2011) Activated tlr signaling in atherosclerosis among women with lower framingham risk score: the multi-ethnic study of atherosclerosis. Plos One 6(6):e21067
Huber PJ (1964) Robust Estimation of a Location Parameter. The Annals of Mathematical Statistics 35:73–101
Huber PJ (1983) Minimax aspects of bounded-influence regression. Journal of the American Statistical Association 78(381):66–72
Kim M, Lee S (2016) Nonlinear expectile regression with application to value-at-risk and expected shortfall estimation. Computational Statistics & Data Analysis 94:1–19
Liu Y, Zeng P, Lin L (2020) Degrees of freedom for regularized regression with Huber loss and linear constraints. Statistical Papers 1-23
Loh PL, Wainwright MJ (2015) Regularized M-estimators with Nonconvexity: Statistical and Algorithmic Theory for Local Optima. Journal of Machine Learning Research 16:559–616
Maronna RA, Martin RD, Yohai VJ (2019) Robust statistics: theory and methods (with R). John Wiley & Sons,
Massart P (2007) Concentration inequalities and model selection. Springer, Berlin
Newey WK, Powell JL (1987) Asymmetric least squares estimation and testing. Econometrica: Journal of the Econometric Society 819-847
Parikh N, Boyd S (2014) Proximal algorithms. Foundations and Trends in Optimization 1(3):127–239
Rigby RA, Stasinopoulos DM (1996) A semi-parametric additive model for variance heterogeneity. Statistics and Computing 6(1):57–65
Rivasplata O (2012) Subgaussian random variables: An expository note. Internet publication, PDF
Smucler E, Yohai VJ (2017) Robust and sparse estimators for linear regression models. Computational Statistics & Data Analysis 111:116–130
Sobotka F, Kneib T (2012) Geoadditive expectile regression. Computational Statistics & Data Analysis 56(4):755–767
Tibshirani R (1996) Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological) 58(1): 267-288
Waltrup LS, Sobotka F, Kneib T, Kauermann G (2015) Expectile and quantile regression-David and Goliath? Statistical Modelling 15(5):433–456
Wang L, Wu Y, Li R (2012) Quantile regression for analyzing heterogeneity in ultra-high dimension. Journal of the American Statistical Association 107(497):214–222
Wang L, Zheng C, Zhou W, Zhou W (2020) A New Principle for Tuning-Free Huber Regression. Statistica Sinica
Yao Q, Tong H (1996) Asymmetric least squares regression estimation: a nonparametric approach. Journal of nonparametric statistics 6(2–3):273–292
Zhang CH (2010) Nearly unbiased variable selection under minimax concave penalty. The Annals of statistics 38(2):894–942
Zhao J, Chen Y, Zhang Y (2018) Expectile regression for analyzing heteroscedasticity in high dimension. Statistics & Probability Letters 137: 304-311
Zou H, Li R (2008) One-step sparse estimates in nonconcave penalized likelihood models. The Annals of statistics 36(4):1509–1533
Ziegel JF (2016) Coherence and elicitability. Mathematical Finance 26(4):901–918
Acknowledgements
The authors thank the Editor and anonymous referees for their valuable comments and suggestions. This research is partly supported by National Statistical Science Research Project (No. 2018LY30), Zhejiang Provincial Natural Science Foundation (No: LY18A010005) and the Research Project of Humanities and Social Science of Ministry of Education of China (No. 17YJA910003).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Lemma 1
The loss function \(\phi (\cdot )\) defined in (2.1) is continuous differentiable. Moreover, for any \(r,r_0\in \mathbb {R}\), we have
Proof
Details can be found in Gu and Zou (2016).
Proof of Theorem 1
For simplicity and convenience in notation, we omit the notation dependence with the pre-determined parameters \(C_u,~C_l\) and denote by \(\tilde{\varvec{\beta }^*}:=\varvec{\beta }^*(C_u,C_l)\) for short.
where the last inequality follows by Condition 2.
On the other hand, \({\tilde{\varvec{\beta }}}^*=\underset{\varvec{\beta } \in \mathbb {R}^{p}}{\arg \min }~\mathbb {E}\psi _{\alpha }(y-\mathbf {x}'\varvec{\beta };C_u,C_l)\). Then
where \(g_{\alpha }(\cdot )\) is defined as follows
Note \(g_{\alpha }(r)\) is continuous and differentiable and
So by the mean value theorem, there exists some \({\tilde{\varvec{\beta }}}\) on the line segment between \({\tilde{\varvec{\beta }}}^*\) and \(\varvec{\beta }^*\) such that
where \({\tilde{r}}=|y-\mathbf {x}'{\tilde{\varvec{\beta }}}|\) and \(C=\min \{C_u,|C_l|\}\).
Denote by \(P_{\epsilon }\) the conditional distribution of \(\epsilon \) on \(\mathbf {x}\) and \(\mathbb {E}_{\epsilon }\) the corresponding conditional expectation, we have
where the second to last inequality is obtained by Markov inequality.
Therefore, \(\mathbb {E}[\phi _{\alpha }(y-\mathbf {x}'{\tilde{\varvec{\beta }}}^*)-\phi _{\alpha }(y-\mathbf {x}'\varvec{\beta }^*)]\) is further bounded by
As for the first term, by Condition 1 and 2,
As for the second term,
since \(\mathbf {x}'(\varvec{\beta }^*-{\tilde{\varvec{\beta }}})\) is sub-Gaussian distributed by Condition 3, hence its 2kth moment is bounded by \(c^2\kappa _0^{2k}\) for a universal positive constant c depending on k only.
Combined with these results, we have
So far, the proof has been completed. \(\square \)
Lemma 2
Let \(\xi _1,\xi _2,\ldots ,\xi _n\) be independent real valued random variables. Assume that there exists some positive numbers \(\nu \) and c such that
and for all integers \(k\ge 3\)
Let \(S_n=\sum _{i=1}^n(\xi _i-\mathbb {E}[\xi _i])\), then for every positive x,
Proof
Details can be found in Proposition 2.9 of Massart (2007).
Lemma 3
Under Condition 1–3, there exist universal positive constants \(\kappa _1',\kappa _2', c_1', c_2'\) such that with probability at least \(1-c_1'\exp {(-c_2'n)}\),
for all \(\Vert \varvec{\beta }\Vert _2\le 4\rho _2\), \(\Vert \varDelta \Vert _2\le 8\rho _2\) where \(\rho _2\) is a sufficiently large constant depending on R and \(C=\max \{C_u,|C_l|\}\ge c_u\rho _2^{-1}\) where \(c_u\) is a positive constant depending on \(M_k,\kappa _l,\kappa _u\) and \(\kappa _0\).
Proof
Define the set \( A=\left\{ (\varvec{\beta },\varDelta ):\Vert \varvec{\beta }\Vert _2\le 4\rho _2, ~\Vert \varDelta \Vert _2\le 8\rho _2\right\} \), then we can show that for any \((\varvec{\beta },\varDelta )\in A\),
for some proper chosen T and \(\tau \) satisfying \(T+8\tau \rho _2\le \min \{C_u,|C_l|\}\), where the threshold function
To show (7.10), if \(|y_i-\mathbf {x}_i'\varvec{\beta }|>T\) or \(|\mathbf {x}_i'\varDelta |>\tau \Vert \varDelta \Vert _2\), the right hand side of (7.10) is 0. So by convexity of the robust loss function (2.2), (7.10) holds trivially. If \(|y_i-\mathbf {x}_i'\varvec{\beta }|\le T\) and \(|\mathbf {x}_i'\varDelta |\le \tau \Vert \varDelta \Vert _2\), then by Lemma 1,
Based on this inequality for \(\delta L_n(\varvec{\beta },\varDelta )\), we follow the similar proof procedure of Lemma 2 in Fan et al. (2017) and obtain the wanted results.
Lemma 4
Suppose \(L_n(\varvec{\beta })\) is convex and \({\tilde{\varvec{\beta }}}^*\), \({\tilde{\varvec{\beta }}}^*+\varDelta \) lies in the feasible set so that \(\Vert \varDelta \Vert _1\le 2R\). If the Restricted Strong Convexity condition holds for \(\Vert \varDelta \Vert _2\le 1\) and \(n\ge 4R^2 \tau _1^2\log p\), then
Proof
Details can be found in Lemma 8 of Loh and Wainwright (2015).
Proof of Theorem 2
If we can prove the following two claims, then by the Theorem 1 of Loh and Wainwright (2015), this theorem holds:
-
Claim I: \(\Vert \nabla L_n({\tilde{\varvec{\beta }}}^*)\Vert _{\infty }\le \lambda L/4\) with overwhelming probability;
-
Claim II: the empirical loss \(L_n(\varvec{\beta })\) satisfies the Restricted Strong Convexity condition.
For Claim I, we use the Bernstein inequality (Lemma 2) and the union bound to establish this result. Through straight calculation,
where \(\psi '_{\alpha }(\cdot )\) is defined in Eq. 3.3.
Note that \(\left| \psi '_{\alpha }(r;C_u,C_l)\right| \le 2\max \{\alpha ,1-\alpha \}|r|\) so that for \(j=1,\ldots ,p\),
Then
where the last inequality above follows a similar procedure as in the proof of Theorem 1 and \(\nu \) is a constant depending on \(\kappa _0\) and \(M_2\).
Denote by \(C=2\max \{\alpha ,1-\alpha \}\times \max \{C_u,|C_l|\}\). Let \(A=\frac{|\psi '_{\alpha }(y_i-\mathbf {x}_i'{\tilde{\varvec{\beta }}}^*;C_u,C_l)|}{C}\), then \(|A|\le 1\). By the Theorem 2.7 of Rivasplata (2012), \(A\mathbf {x}_{ij}\) is also sub-gaussian with the same parameter \(\kappa _0\). Then for any \(k\ge 3\), by the Proposition 3.2 of Rivasplata (2012), we have
where B is a constant depending on \(\kappa _0\).
Meanwhile by the definition of \({\tilde{\varvec{\beta }}}^*\), \(\mathbb {E}[\psi '_{\alpha }(y_i-\mathbf {x}_i'{\tilde{\varvec{\beta }}}^*;C_u,C_l)\mathbf {x}_{ij}]=0\) for \(j=1,\ldots ,p\). Then by the Bernstein inequality from Lemma 2, we have for \(j=1,\ldots ,p\)
Chose \(t=\frac{n\lambda ^2L^2}{128\nu }\) and \(C\le \frac{16\nu }{B\lambda L}\), we have \(\frac{CBt}{n}\le \sqrt{\frac{2\nu t}{n}}\) and therefore,
Through the union bound argument, we have
Chose \(\lambda =\kappa _{\lambda }\sqrt{\frac{\log p}{n}}\) and \(\frac{\kappa ^2_{\lambda }L^2}{128\nu }-1>0\) such that \(\exp \left( -\frac{n\lambda ^2L^2}{128\nu }+\log p\right) \le \exp \left( -cn\right) \) for some positive \(c=\frac{\kappa ^2_{\lambda }L^2}{128\nu }-1>0\).
For Claim II, by Lemma 3, for \(\Vert \varDelta \Vert _2\le 8\rho _2\), with probability at least \(1-c_1'\exp {(-c_2'n)}\),
Using the fact that \(ab\le (a^2+b^2)/2\), we obtain that
with \(\kappa _1=\frac{1}{2}\kappa _1',~~\tau _1=\frac{1}{2}\kappa _1'\kappa _2^2\).
Without loss of generality, we assume \(\rho _2\ge 1/8\). So we have proved that the first scenario of the Restricted Strong Convexity, i.e., the Restricted Strong Convexity condition holds for the empirical loss \(L_n(\varvec{\beta })\) when \(\Vert \varDelta \Vert _2\le 1\). By Lemma 4, when \(n\ge 4R^2 \tau _1^2\log p\), the whole Restricted Strong Convexity condition holds.
Then based on the two claims above, by probability union bound argument, there exist positive constant \(c_1,c_2\) such that with probability at least \(1-c_1\exp \{-c_2n\}\), the statistical error bound holds.
Rights and permissions
About this article
Cite this article
Zhao, J., Yan, G. & Zhang, Y. Robust estimation and shrinkage in ultrahigh dimensional expectile regression with heavy tails and variance heterogeneity. Stat Papers 63, 1–28 (2022). https://doi.org/10.1007/s00362-021-01227-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00362-021-01227-2